Design of an Embedded Network Video Server Based on DM642
2026-04-06 06:20:05··#1
Abstract: Embedded network video servers are increasingly valued by security manufacturers and customers due to their high reliability and convenient networking. The core technologies used in video servers generally include video compression algorithms, audio compression algorithms, and network transmission protocols. Using a dedicated DSP for media processing, the development time is short. The advantage is that since the algorithm is software code, the product performance can be continuously upgraded, and the cost of repeated development is low. Keywords: DSP, network video server Introduction The mainstream product in the current security monitoring field is DVR (Digital Video Recorder), which is mainly suitable for local monitoring applications with concentrated monitoring points. However, with the growth of demand for remote distributed monitoring, embedded network video servers are increasingly valued by security manufacturers and customers due to their high reliability and convenient networking. There are many options for video server solutions, but the mainstream products on the market generally choose two solutions: (1) CPU+ASIC. This solution uses an ARM-based CPU and a dedicated media processing chip. The advantage is that the development time is relatively short, but due to the use of ASIC, the flexibility is poor, and once the product is finalized, it is difficult to change. (2) Using a dedicated DSP for media processing. Its development time is short, and its advantage lies in the fact that since the algorithm is software code, product performance can be continuously upgraded, resulting in lower repetitive development costs. Based on the above points, this system adopts the second design scheme. I. System Hardware Design The most important function of the video server is to complete the acquisition, compression, and transmission of images and sound. The core technologies used in video servers generally include video compression algorithms, audio compression algorithms, and network transmission protocols. Currently, the mainstream technologies on the market are mainly MPEG4 or H26x video compression algorithms, AAC audio compression algorithms, G.72x voice compression algorithms (or AAC audio compression algorithms), TCP/IP protocols, etc. The DM642 is a DSP launched by TI for multimedia processing applications. It is based on the C64x and adds many peripheral devices and interfaces. This DSP is a 548-pin BGA package and is highly integrated. The main peripheral devices include: three configurable video interfaces that can be seamlessly connected to video input, output, or transport stream input; VCXO internal control port (VIC); 10/100Mbps Ethernet port (EMAC); and data management input/output module (MDI0). Multi-channel audio serial port (McASP). I2C bus module. Two multi-channel buffered serial ports (McBSPs). Three 32-bit general purpose timers. User-configurable 16-bit or 32-bit master port interface (HPI16/HPI32). 6MHz 32-bit PCI interface. General Purpose I/O (GPIO) ports. A 64-bit external memory interface, supporting connections to synchronous or asynchronous memory units. The system hardware block diagram is as follows: This system is designed with 3 video ports, 2 on-board decoders and 1 on-board encoder, 32Mbytes synchronous DRAM via the FPGA's OSD, 4Mbytes Flash memory, and a 10/100 Ethernet port configured via the board software configuration options loaded through the FPGA's internal registers. The DSP chip connects to on-board peripherals via a 64-bit EMIF interface or 8/16-bit 3-channel video interfaces. Each device—SDRAM, Flash, FPGA, and UART—occupies one of these interfaces. The EMIF interface also connects to the expansion backplane interface, used to connect to the backplane. The on-board video decoders and encoders connect to the video ports and expansion connectors. The two encoders and one decoder on the motherboard conform to standard specifications. McASP can be reconfigured as an expansion interface via software. A Programmable Gate Array (FPGA) is used to execute the logic programs combined on the board. The FPGA has registers based on the software user port, which users can read and write to configure the board. The system hardware design mainly consists of the following parts: (1) Memory mapping. The C64xx series DSP has a large number of bytes of configurable address space. Program code and data can be stored anywhere in the unified standard 32-bit address space. By default, the internal registers are stored starting from the 0x00000000 address space. A small portion of the memory can be remapped by software to L2 cache instead of fixed RAM. The EMIF (External Register Port) has four independent configurable address areas called chip enable spaces (CE0-CE3). When Flash, UART and FPGA are mapped to CE1, SDRAM occupies CE0. The backplane uses CE2 and CE3. A portion of CE3 is configured for synchronous operation of the OSD function and other synchronous register operations in the extended FPGA. (2) EMIF Port. This system is designed with a 64-bit external memory port. The address space is divided into four chip enable areas, allowing synchronous or asynchronous access to the address space in 8-bit, 16-bit, 32-bit, and 64-bit modes. The DM642 board uses chip enable areas CE0, CE1, and CE3. CE0 is sent to the 64-bit SDRAM bus. CE1 is used by the 8-bit Flash, UART, and FPGA functions. CE3 is set to synchronous function. CE2 and CE3 are both sent to the backplane interface connector. (3) SDRAM Register Port. This system is designed to connect a 64-bit SDRAM bus in the CE0 space. This 32-megabyte SDRAM space is used to store program, data, and video information. The bus is controlled by an external PLL driver and operates at the optimal operating state of 133MHz. SDRAM refresh is automatically controlled by the DM642. The PLL used by EMIF is called ICS512, and the input clock of the PLL is 25MHz. The DM642 can configure the original value of the EMIF clock. The ECLKIN pin is generally at the default value, but it can also control the clock frequency of EMIF by dividing the CPU clock. During reset, the operation of the ECLKINSEL0 and ECLKINSEL1 pins is set, and they share the address space of EMIF with the EA19 and EA20 pins. (4) Flash register interface The system is designed with 4M Flash, which is mapped to the low bit of the CE1 space. The Flash register is mainly used to import, load and store the configuration information of the FPGA. The CE1 space of the DM642 evaluation board is configured as 8 bits, and the Flash register is also 8 bits. Since the available address space of CE1 is smaller than the Flash space, the FPGA is used to generate 3 extended pages. These extended linear addresses are defined through the base register of the FPGA Flash, and the default value after reset is 000. (5) UATR interface. Dual UART registers are mapped to the high bits of the CE1 space of the DM642, along with the FPGA asynchronous registers. Each UART, A and B, generates an 8-bit address. This system design configures the CE1 space for 8-bit access. (6) FPGA asynchronous register port. The FPGA has 10 asynchronous memory registers located in the high bits of the CE1 space. The various functions implemented by these registers will not be detailed due to space limitations. FPGA synchronous register port The FPGA opens synchronous registers in the CE3 address space. These registers mainly implement OSD functions and some evaluation board connections. (7) EMIF buffer/decoder control. The functions of the EMIF buffer and decoder are implemented through the GAL16LV8D general logic array driver, U15. The driver can perform simple decoding of the Flash, and the UART and buffer jointly control CE1, CE2 and CE3. (8) Video Ports/McASP Ports. This system is designed with three onboard video ports, which can be reclassified according to optional functions, such as the McASP and SPDIF functions of ports 0 and 1. The DM642 uses all three video ports, with video ports 0 and 1 used as input ports and video port 2 used as a display port. In the standard configuration, video ports 0 and 1 are reclassified according to their use under the McASP function and connected to the TLV320AIC23 stereo codec or to the SPDIF output interface J9. (9) Video Decoder Ports. The reclassifiable video ports 0 and 1 are used as capture input ports, named Capture Port 1 and Capture Port 2. These ports are connected to the SAA7115H decoder. The video ports pass through the CBT switch, so they can be selectively disabled for backplane use. The other half of the ports are connected to the onboard McASP ports. Capture Port 1 is connected to the video source through an RCA type video socket J15 and a 4-pin low-noise S-Video interface J16. The input must be a synthesized video source, such as a DVD. Player or video camera. The SAA7115H is programmable via the DM642's I2C bus and can connect to all major composite video standards, such as NTSC, PAL, and SECAM, which can be properly programmed via the decoder's internal registers. (10) Video encoder port. In this system design, video port 2 is used to drive the video encoder. It is transmitted via FPGA U8 to implement advanced functions such as OSD. However, by default, it is directly connected to the SAA7105 video encoder via video. This encoder can encode RGB, HD composite video, NTSC/PAL composite video, and S-Video, which is programmed via the SAA7105's internal registers. The SAA7105's internal programming registers are configured via the DM642's I2C bus. The encoder connects to a composite or RGB display unit. RGB images are provided via standard RCA sockets J2, J3, and J4. The green output of J3 can also be used to interface to the composite display unit. A 4-pin low-noise S-Video J1 is also available. A 15-pin high-density DB interface allows the system to drive VGA-type monitors. This system is designed for high-definition TV output, but requires the replacement of some special filters that support HDTV. (11) FPGA video function. This system design uses a Xilinx XC2S300E series FPGA to implement enhanced video functions and other related functions. In the default mode, the FPGA outputs video to the Phillips SAA7105 video encoder via video port 2 of the DM642. For HDTV, the FPGA provides an enhanced clock; for OSD functions, the FPGA provides FIFOs to mix the data from video port 2 with the data from the FIFOs port. The FPGA's FIFOs are accessed via the EMIF of the DM642 in synchronous mode through the CE3 space. (12) Ethernet port. In standalone mode, the Ethernet MAC of DM642 is automatically selected and sent to PHY via CBT. This system design uses Intel LXT971 PHY. The 10/100Mbit port is isolated and outputs to the RJ-45 standard Ethernet interface, J8. PHY is directly connected to DM642. During the manufacturing process, the Ethernet address is stored in the I2C continuous ROM. The RJ-45 interface has two indicator lights, making it a complete port. The two indicator lights are green and yellow, used to indicate the Ethernet connection status. A green light indicates that it is connected, a flashing green light indicates that the connection is active; a yellow light indicates that it is in full-mode. II. System Software Design1. Data Flow (1) A frame of image provided by the input device is acquired into the input buffer and resampled from YUV 4:2:2 format to YUV 4:2:0 format. (2) Image data is provided to the processing module by the input task module through an SCOM sequence. (3) Provide image data to JPEG The encoding library program dynamically detects and compares the images with previous images. The dynamic part is compressed into a JPEG image and sent to the network task module via an SCOM message. (4) The network task module creates a JPEG copy. When a peer endpoint on the network connects to the network and requests a "record", the network task module sends these images to the peer endpoint. (5) If a peer endpoint on the network requests a "playback" connection, the network task module will receive new JPEG images from that peer endpoint and send the original and updated images to the processing task module. The message is sent via an SCOM sequence. The decoded YUV 4:2:0 format image is resampled into a YUV 4:2:2 format image. (6) The JPEG image is used as the input of the decoder. The decoded image is transmitted to the output task module via an SCOM sequence. (7) The output task module converts the decoded YUV 4:2:0 format image into a YUV 4:2:2 format image and sends it to the display. The display device displays the output image. 2. Data flow diagram3. Program flow (1) The experimental program uses RF-5 to integrate JPEG. The program uses an encoding and decoding library. It employs a 6-task module structure. Four of these tasks are described in the diagram above. The fifth task is a control task that sends messages to the processing task module using a mailbox. The processing task module receives messages from the mailbox and adjusts the image frame rate according to the image quality specified in the message. The sixth task module is the network initialization module, defined by the CDB file to handle the initialization of the network environment. Once the network is ready, the network task module shown in the diagram above is established. Before entering the DSP/BIOS scheduler, the program initializes several modules to be used, including: ① Processor and system board initialization: Initializing the BIOS environment and CSL, setting the use of a 128K L2 cache, setting the L2 cache to be mapped to the EMIF's CE0 and CE1 spaces, setting the DMA priority sequence length to the maximum value, setting the L2 cache request priority to be the highest, and initializing the DMA manager with the internal and extended heaps. ② RF-5 module initialization: The system initializes the RF-5 channel module, and initializes the ICC and SCOM in the RF-5 framework used for internal unit communication and message passing. The modules and channels are built on internal and extended heaps. ③ Establish ingest and display channels: Establish and start an instance of an ingest channel. (2) After completing the initialization work, the system enters the 6-task system managed by the DSP/BIOS scheduler. The 6 tasks send messages to each other through the SCOM module of RF-5. The following are the 6 tasks: (A) Input task. The input task obtains the video image from the input device driver. It uses the FVID_exchange call provided by the driver to obtain the latest video image frame from the input device. The obtained image is in YUV 4:2:2 format, which is resampled to YUV 4:2:0. The input task then sends a message to the processing task, which contains the image data pointer. Then it waits for the message sent by the processing task to continue processing. (B) Processing task. The processing task contains two units. The first unit is a JPEG encoding unit, which accepts the YUV 4:2:0 format image and produces a JPEG image with user-customized compression quality. The second unit is a JPEG decoding unit, which receives JPEG The image is compressed and decompressed. The decoded image format is YUV 4:2:0. First, if annotation is activated, the processing task module adds timecode to the input image. Then the image is transmitted to the encoding algorithm unit. After the JPEG image is generated, the original image undergoes dynamic detection by comparing pixels at fixed grid points. Both the JPEG image and the dynamic detection result are transmitted to the network task module for subsequent processing. After the network task module completes processing, it returns a JPEG image to the processing task module for decoding and display. This image can be the image just transmitted to the network task module or an image just obtained from the network. After decoding, if an annotation grid is set, the processing task module annotates the image with the grid. The output image is then sent to the output task by sending an SCOM message. (C) Output Task. The output task displays the image on the display device. It uses the FVID_exchange call provided by the output driver to display the image. The image it obtains is in YUV 4:2:0 format and needs to be resampled to YUV 4:2:2 format. Then it waits for a message from the processing task to continue running. (D) Control Task. The control task manages optional parameters, which can control the JPEG image frame rate and compression quality. The control task detects changes to parameters defined in a global structure "External Control," copies the updated parameters to a task-defined structure "External Control_prev," and sends a message to the processing task module's mailbox. The processing task module periodically checks these messages and calls the corresponding unit's control function. (E) Network Initialization Task. The network initialization task starts the network environment. Once the network is ready, the network task is established. (F) Network Task. The network task supports network function calls in the system. After initialization, it starts listening on two ports (3001 and 3002). Port 3001 is used for the "playback" connection when the client wants to send a video stream to the DSP. Port 3002 is used for the "recording" connection when the client wants to receive a video stream from the DSP. The network task module then waits for an SCOM message from the processing module, which should contain a new JPEG image available for use. First, the network module uses the received JPEG image to create a structure in RAM that can be recognized by the HTTP server and sent to the HTTP server. When a "Record" connection is active, the network module first checks if the client has sent any commands (IMAGE1.JPG) to the client's image file. These commands include setting the date and time, whether to display the date and time, and whether to display a grid on the output image. Then, the received JPEG image undergoes activity detection. If the image has changed, it is sent to the "Record" connection. Otherwise, an empty file indication is sent to keep the client's image synchronized. Next, if a "Playback" connection is active, a new JPEG image is obtained from that connection. This new image replaces the image sent by the processing module. The network module returns the JPEG image to the processing module by sending an SCOM message. III. Debugging and Conclusion This system has been successfully debugged using the CCS 2.2 integrated development environment provided by TI (Texas Instruments). It has broad application prospects in the field of security monitoring. The innovations of this paper lie in its use of a dedicated DSP for media processing to develop a network video server. This approach minimizes development time and, importantly, allows for continuous performance upgrades and reduces repetitive development costs, as the algorithms are software-based. TI's CCS compiler is already highly optimized, and combined with the powerful processing capabilities of the DSP itself, most processing algorithms can be implemented using standard C. However, video servers typically handle multiple video inputs. Faster encoding speeds mean the ability to process more input images, resulting in higher cost-effectiveness. Therefore, fully utilizing the DM642's maximum performance is another key innovation of this paper.