0 Introduction
Images are records of the world by natural biological or man-made physical observation systems; they are a form of information recording using physical media as carriers. Image information is crucial knowledge for humankind's understanding of the world. According to scholars, over 80% of the information humans acquire comes from images captured by their eyes. However, these static images are no longer sufficient to meet people's demands for video information. As people's requirements for video data increase, the volume of high-definition, real-time video data is growing, and the difficulty of real-time video processing is also gradually increasing. This paper presents a design method for an embedded real-time video acquisition system based on DSP+FPGA. This system can be widely used in places related to public safety, such as banks, airports, train stations, and shopping malls.
1. Structure of Real-time Video Acquisition System
There are two main types of common video capture systems: one is a video capture card based on a single processor (microcontroller, ARM, etc.), characterized by its simple structure and ease of implementation. However, it cannot process video data in real time and requires an external processor to perform specific video processing algorithms, resulting in high cost and difficult upgrades and maintenance. The other type is an embedded video capture card based on a master-slave processor (ARM+DSP, FPGA+DSP, etc.), characterized by high system integration, ease of maintenance and upgrades, and the ability to meet video capture requirements while performing specific video algorithms at a lower cost. Therefore, this paper presents a design scheme for an embedded video capture system based on DSP+FPGA. Its system structure block diagram is shown in Figure 1.
2 System Hardware Design
The design concept of this system is to acquire video signals using an analog video camera, and then use an analog-to-digital converter chip SAA7111A to convert the analog PAL video signal into a YUV4:2:2 digital video signal. The design uses an FPGA chip EP1C6Q240C8 as a coprocessor to perform video signal buffering and video frame synthesis. A dual-RAM ping-pong structure is used to ensure the integrity of the video frames. After preprocessing the video data, the data is passed to the DSP to perform specific video processing algorithms (such as compression). Finally, the processed video data is transmitted and stored. Simultaneously, the main processor DSP is also responsible for initializing and configuring the video acquisition chip. The system hardware structure is shown in Figure 2.
2.1 Video Acquisition Module
A crucial step in designing a video acquisition system is typically converting external optical signals into electrical signals, and then using a dedicated video conversion chip to convert the analog video signal into a digital video signal. This design utilizes an analog CMOS camera and Philips' high-performance video analog-to-digital converter integrated circuit chip, SAA7111A.
The SAA7111A is a high-performance video input processing chip from Philips. It has four analog video signal inputs, capable of accepting four CVBS or two S-video (Y/C) signals. Users can also programmatically select one or two of the four video inputs to create different operating modes. It can automatically monitor and separate horizontal and vertical sync signals, or automatically detect vertical frequencies of 50Hz or 60Hz, and automatically switch between PAL and INTSC systems. It can also process luminance and chrominance signals from different input formats, enabling real-time on-chip control of luminance, chrominance, and saturation. The I2C bus interface in the SAA7111A allows for configuration of on-chip registers. It has 32 control registers, 22 of which are programmable. The device outputs a 16-bit V... The PO bus has output formats including 12-bit YUV4:1:1, 16-bit YUV4:2:2, 8-bit CCIR-656, 16-bit 565RGB, and 24-bit 888RGB; the output signals can provide various synchronization signals such as sampling clock, horizontal synchronization, and vertical synchronization.
The SAA7111A can convert PAL analog video signals into YUV4:2:2 16-bit digital video data, with a size of 625×720×16bit. 25 lines are used for field retrace, so each frame has 576×720×16bit of effective data.
2.2 Video Front-End Processing Module
The amount of digitized video data is typically enormous. To ensure the integrity and real-time performance of the video data, a dedicated video front-end processing module was designed. Its main functions include video data buffering, video frame synthesis, ping-pong operations, and communication with the DSP. Since FPGAs can be repeatedly programmed, simplifying the system, reducing board size, facilitating maintenance, and enabling easy upgrades, this paper uses ALTERA's EP1C6Q240C8 to perform the video front-end processing.
Since this video conversion chip does not provide an address signal, an address generator needs to be designed inside the FPGA to ensure that the data can be stored accordingly. The SAA7111A provides four important signals: LLC (reference clock signal), HREF (horizontal reference signal), VREF (vertical reference signal), and RTSO (odd field signal).
Because the PAL standard has interlaced scanning characteristics, the acquired video data can be divided into even and odd fields. Since video image processing is performed on complete video frames, it is necessary to synthesize the video data from the even and odd fields. This is achieved by combining the even and odd fields into one frame: even field address = base address 0 + offset address; odd field address = base address E + offset address. A schematic diagram of the frame synthesis operation is shown in Figure 3.
To ensure the real-time performance of the video acquisition system, a dual-RAM ping-pong mechanism is used. Ping-pong operation is widely used in FPGA timing design and is a typical design philosophy that trades area for speed. This structure distributes the input data stream to two data buffers simultaneously via an input data selection unit. In the first buffer cycle, the input data stream is buffered into data buffer module 1; in the second buffer cycle, the input data stream is buffered into data buffer module 2 by switching the input data selection unit, while the data from the first cycle buffered in data buffer module 1 is selected by the output data selection unit and sent to the processing unit for processing; subsequently, in the third buffer cycle, the input and output buffer modules are switched again. This cycle repeats continuously. The specific state machine is shown in Figure 4.
The communication module in the system mainly sends a signal to the FPGA after the DSP finishes processing the data to notify the DSP that it is in an idle state. When the internal module of the FPGA receives the signal, it then transmits the data to the DSP.
2.3 Video Backend Processing Module
This system utilizes the TI TMS320VC5509A, a high-performance, low-power fixed-point DSP chip. Its internal main clock frequency can reach up to 200MHz, with a maximum processing speed of 400MIPS. The DSP has a large on-chip RAM, including 32K×16-bit DARAM and 96K×16-bit SARAM, totaling 128K×16 bits of on-chip storage. It features abundant on-chip peripherals, including a real-time clock (RTC), a 10-bit ADC, an MCBSP interface, a high-speed USB interface (12Mb/s), an MMC/SD (multimedia card) interface, and an I2H interface. The DSP processor operates on a low-voltage power supply, using a core voltage of 1.6V and an I/O voltage of 3.3V, with power consumption as low as 0.2mW/MIPS.
As the main processor of the video acquisition system, the DSP is primarily responsible for configuring various interfaces and peripherals, as well as real-time video processing. This includes a clock generator (PLL), I2C bus interface, EMIF module, USB interface, etc.
Only when all interfaces work in coordination can the normal operation of the system be guaranteed. Among them, the clock generator is responsible for multiplying the external 24MHz crystal oscillator clock to 200MHz as the system operating clock; the I2C bus is responsible for initializing and configuring the video acquisition chip SAA7111A; and the USB interface is responsible for communication with the host computer to realize data transmission.
Considering the massive amount of video data and the limitations of the DSP's on-chip ROM, this system expands the DSP with a 4M×16bit SDRAM and a 256K×16bit FLASH. The SDRAM is mapped to the DSP's CE2 and CE3 spaces, while the FLASH is mapped to the CE1 space. Since peripheral interface configurations are generally complex, TI's on-chip support library (CSL) functions are used to simplify user interface configuration.
Video data typically contains a lot of redundant information (temporal redundancy, spatial redundancy, etc.), thus necessitating compression. The main purpose of video coding is to represent video information with as few bits as possible while ensuring reconstruction quality, and to remove as many redundant characteristics inherent in video image data as possible, such as spatial redundancy, temporal redundancy, psychovisual redundancy, and entropy coding redundancy. Common compression standards include JPEG, MPEG-1, MPEG-2, H.261, and H.263. These algorithms are generally quite complex and process a huge amount of data. DSPs, with their Harvard bus and pipelined operation internal structures, have a significant advantage in implementing video processing algorithms. Programming and debugging of video algorithms can be completed in the CCS (Code Composer Studio) 2.0 environment, implemented in C language, which facilitates cross-platform portability, optimization, and upgrades.
3. Conclusion
The real-time video acquisition system based on DSP+FPGA designed in this paper adopts a dual-RAM ping-pong structure to achieve real-time video acquisition. The DSP main processor implements the JPEG compression algorithm, and online programming technology and JTAG are used for online debugging. Therefore, this system features small size, low cost, low power consumption, high speed, strong adaptability, and easy maintenance, thus showing good application prospects in real-time image processing.