Monolithic DSP Implementation of MPEG Audio Encoding

Abstract To develop a cost-effective monolithic fixed-point DSP audio encoder, this paper analyzes in depth the computational and storage requirements of the MPEG audio coding standard reference algorithm, comprehensively considers the requirements of coding quality and processor speed, and, based on computer simulation results, identifies the key aspects of implementation using a monolithic fixed-point DSP. Based on Analog Devices' ADSP-2181, and fully utilizing its hardware architecture optimized for audio processing, a hardware and software scheme for real-time MPEG audio layer 2 coding was designed and implemented. Test results show that, under the conditions of using MAC-based filter precision extension and an improved psychoacoustic model algorithm, both coding quality and real-time performance can be guaranteed. Keywords: Compression coding; Sound processing; Digital signal processing MPEG [1] sound compression algorithm is the first international standard for high-fidelity digital sound compression. Since the International Organization for Standardization and the International Electrotechnical Commission adopted this standard at the end of 1992, it has been widely used in the fields of digital sound storage, multimedia transmission on the Internet, and digital audio broadcasting (DAB) [2]. However, the MPEG sound coding algorithm is quite complex, and it has high requirements for computation and storage. In addition, the market demand for encoders is not large, so there is no dedicated ASIC chip so far. The commonly used method is to use a general-purpose DSP to complete MPEG sound compression coding. However, only a few companies such as DEC, Philips, and Xingit abroad have completed this algorithm in a single DSP, and the purchase cost is expensive and there is no source code. In China, two TI TMS320C30 chips are used to implement the second layer of MPEG sound coding [3]. However, using two DSP chips not only makes the coordination and control circuit complex, but also the price is still high with the addition of external memory. Therefore, it is inevitable to study a software and hardware implementation with proprietary rights and low price. 1 MPEG audio coding principle: MPEG audio coding is a sub-band audio coding algorithm based on the characteristics of human hearing; it belongs to a type of sensory audio coding method. The basic structure of the sensory audio coding algorithm is shown in Figure 1. Depending on whether the encoder emphasizes frequency resolution or time resolution, it can be divided into sub-band encoders and transform encoders. The MPEG audio layer 2 coding algorithm divides the audio signal into 32 sub-bands in the frequency domain, belonging to a sub-band encoder. In Figure 1, the time-frequency mapping, also called the filter bank, is used to map the input audio signal into subsampled frequency components. Depending on the nature of the filter bank used, i.e., the resolution of the filter bank in the frequency domain, these frequency components can also be called sub-band samples or frequency lines. [align=center] Figure 1: Block diagram of sensory audio decoder[/align] The output of the filter, or the output of the time-frequency transform in parallel with the filter bank, is provided to the psychoacoustic model to estimate the time-dependent sound masking threshold. The psychoacoustic model uses known simultaneous masking effects, including masking characteristics with and without tone. Using front-and-back masking effects can further improve the accuracy of the masking threshold estimation. Sub-band samples or frequency lines are quantized and encoded according to the criterion of ensuring that the spectrum of quantization noise is as close as possible to the masking threshold, thus minimizing the noise introduced by quantization perceptibly to the human ear. Depending on the complexity requirements, block companding or entropy coding analysis and synthesis methods can be used. Frame packing combines the quantized and encoded outputs with relevant side information in a specified format for use by the decoder. 2. Encoding Quality and DSP Speed The key to implementing MPEG audio encoding on a single ADSP-2181 chip lies in solving two problems: first, how to ensure audio encoding quality; and second, how to fully utilize the DSP's processing speed. These two problems are often contradictory, requiring the finding of an optimal balance. Generally speaking, the quality of an MPEG audio encoder is mainly determined by the quality of its acoustic model. However, this conclusion no longer applies to applications using a single 16-bit fixed-point DSP. Analysis shows that the finite word length effect becomes the main problem affecting coding quality. In particular, the truncation effect in the analysis filter bank introduces noise that is 33 times greater than the quantization error of the 16-bit AD conversion, while the finite length representation of the window coefficient reduces the filter response, which originally had a sidelobe attenuation of up to 96 dB, to less than 70 dB. Therefore, to ensure audio coding quality, the analysis filter bank algorithm must be extended in terms of accuracy. Regarding the speed issue, the first thought is to use a fast algorithm, and we have also tried using a fast algorithm in subband filtering [4]. However, practice has shown that these fast algorithms do not perform ideally on DSPs for the following three reasons: (1) They only consider the number of additions and multiplications, while ignoring operations such as assignment and addressing. However, for DSPs where all instructions are single-cycle, the number of multiplications and additions is not particularly important compared to other operations; (2) They do not consider the hardware characteristics of DSPs, and their algorithms cannot fully utilize the parallel processing capabilities of the DSP's multiply-accumulate (MAC) processor; (3) ADSP-2181 is optimized for 16-bit algorithm operations. When precision expansion is required, the computational load will increase dramatically by orders of magnitude. Based on the above analysis of quality and speed requirements, we selected a polyphase filter bank implementation method suitable for DSP multiply-accumulate instructions and adopted a precision expansion method based on the MAC structure, which effectively solved the contradiction between encoding quality and DSP speed. In addition, improvements were made to the input method of sampling data, the psychoacoustic model, and the scaling factor encoding to suit ADSP-2181, reducing the computational load and ensuring real-time performance. 3 Software Design of the Algorithm Software design is the core of the single-chip DSP implementation of MPEG audio coding. The requirements for coding quality and speed need to be achieved through careful design of DSP software. (1) Precision extension based on MAC structure The analysis filter bank of MPEG audio coding can be implemented in many ways. The polyphase structure is one recommended by the MPEG standard. Its mathematical representation is: Analysis shows that double word extension of Yk can reduce the noise caused by the truncation effect by 33 times. However, considering that ADSP-2181 only supports 16-bit multiply-accumulate operation, it is necessary to transform equation (1). In this way, the DSP's multiply-accumulate structure can be used, and the amount of operation only increases by about 1 time, and the amount of storage only increases by 64 words. (2) Organization of input data The organization of input data not only needs to consider the convenience of obtaining the original audio data from the digital-to-analog converter, but also needs to consider the storage of input data in the on-chip data RAM. It is suitable as the input for the FFT operation of polyphase filter bank and acoustic model. The polyphase filter bank shifts in 32 new audio data and shifts out 32 old samples each time. The operation is as follows: [align=center] Xi = Xi - 32, i = 511, 510, ..., 32 Xi = next-input-audio-sample, i = 31, 30, ..., 0 However, the ADSP-2181 is not suitable for data shifting; each assignment operation requires two instructions to complete, and each analysis and filtering operation requires 1024 instruction cycles. If the multi-channel automatic buffered serial port and indirect addressing capabilities of the ADSP-2181 are utilized, and the input audio data is appropriately organized, the sliding window method can be used to shift data in and out, as shown in Figure 2. Figure 2 To ensure the continuity of frame boundary processing, the sliding window technique for polyphase filtering requires an input data buffer designed as a circular buffer, with a length sufficient to store two frames of audio input data. While the DSP is processing one frame, the input data can be buffered to the next, thus saving data movement overhead. Simultaneously, the organization of the input data must facilitate the FFT operation of the acoustic model, which utilizes the address inversion mode of the ADSP-2181. Since FFT calculation and input data buffering occur simultaneously, the pointers for FFT calculation need address inversion, while the pointers for the input buffer cannot, otherwise the input audio data will be disordered. The ADSP-2181 provides this capability; its first address pointer group I0, I1, I2, I3 has address inversion capability, while the second address pointer group I4, I5, I6, I7 is unaffected by address inversion. Therefore, pointers are selected from the second address pointer group for input buffering and from the first address pointer group for FFT calculation. (3) Improvement of the acoustic model One of the difficulties in implementing the psychoacoustic model using DSP is the large number of logarithmic operations involved. Although it can be approximated by polynomials, the huge amount of computation makes it an unwise choice. In the improved psychoacoustic model, the FFT operation is not immediately converted to the logarithmic domain. Instead, a piecewise polygonal line is used to approximate the masking effect curve in the linear domain. For simplicity, a piecewise method consistent with the standard is used. The approximation adopts the method of taking the exponent of the polynomial expansion of the first term. Although this method is relatively crude, as analyzed above, the acoustic model is not the main problem when implemented in 16-bit fixed-point, so it is still acceptable. After obtaining the masking threshold, it is still necessary to convert from the linear domain to the logarithmic domain in order to calculate the signal-to-mask ratio for bit allocation. At this time, we adopt an approximate calculation method using the ADSP-2181 shifter. Through the EXP instruction, the exponent of the two's complement fraction can be extracted, and for energy, 1 bit is about 3dB. Therefore, multiplying the exponent by 3 will approximately give the dB value of the two's complement fraction, and the influence of the mantissa is ignored. (4) Encoding of scaling factors The MPEG sound coding standard gives a total of 63 scaling factors, but not all of these scaling factors can be represented by 16-bit binary numbers. If double words are used for precision extension, there will be a huge overhead of double word division during quantization. Therefore, only the subset that can be accurately represented by 16-bit two's complement fractions is used, that is, the scaling factors whose number is a multiple of 3 and less than or equal to 45. After adopting the subset of scaling factors, the scaling factor encoding can no longer be obtained by comparison, but can be obtained directly by calculating the exponent of the maximum amplitude of the subband, which simplifies the encoding of scaling factors. (5) Software simulation results Combined with the above algorithm improvements, based on the characteristics of ADSP-2181 and the MPEG standard, software simulation was performed using AD's development software. Table 1 lists the estimated results of the computational and storage requirements of each module obtained from the simulation. The simulation was conducted with a sampling rate of 48kHz, a stereo encoding mode, an input signal of a 1kHz sine wave, and an output bit rate of 192kbit/s. As shown in Table 1, the performance of ADSP-2181 was fully utilized. The simulation results show that under the above conditions, the signal-to-noise ratio of the decoded output can reach about 80dB. It can be seen that the algorithm improvements are quite effective. [align=center] Table 1 Computational and storage requirements of each module[/align] 4 Hardware design The hardware structure block diagram is shown in Figure 3. The basic functions of each module are as follows: DSP core: In addition to completing all encoding algorithms, it also completes the initialization configuration of the analog-to-digital conversion circuit; selects the sampling clock through the auxiliary control circuit, and receives the encoding parameters of the host through the interface circuit. Auxiliary control circuit: Implemented by FPGA and auxiliary circuits, it completes the functions of clock generation, FIFO status monitoring, address decoding, etc. Output buffer: Temporary storage area for encoded bit stream, and provides a fully asynchronous output interface. It is particularly useful in applications that require image and sound lip-sync. External memory: Includes BDMA space and I/O space. Analog-to-digital conversion circuit: Completes the digitization of sound and is directly connected to the serial port 0 of the DSP. The sampling frequency is determined by the frequency of the externally provided 256 times sampling clock and needs to be initialized before normal operation. Interface circuit: The interface circuit is divided into two parts, one is the encoding output interface and the other is the interface connected to the host. The host interface uses RS232 interface chip to complete the connection between the DSP serial port 1 and the host serial port. The DSP uses interrupts and internal timers to realize asynchronous serial communication. References [1] ISO/IEC 11172-3-1993 Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s——part 3: audio [2] Brandenburg K, Dehry YF, Johnston JD, et al. ISO-MPEG-1 audio: a generic standard for coding of high-quality digital audio. J Audio Eng Soc, 1994, 42 (10): 780～791 [3] Wang Jianxin, Dong Zaiwang, Yin Rifangqiang Research and real-time implementation of MPEG audio coding algorithm Journal of Tsinghua University, 1997, 37 (10): 45～48 [4] Konstantinides K, Fast subband filtering in MPEG audio coding. IEEE Signal Processing Letters, 1994

Monolithic DSP Implementation of MPEG Audio Encoding

Read next

CATDOLL 126CM Yoyo

CATDOLL 135CM Vivian

Discussion on DCS Field Reliability Applications

CATDOLL CATDOLL 115CM Saki TPE