H.264 Coal Mine Video Surveillance System Based on Industrial Ethernet

Abstract: Based on the analysis of the networking trend and real-time requirements of video surveillance systems, an embedded H.264 video encoder supporting industrial Ethernet transmission in coal mines was designed and implemented using the DSP TMS320C6416 and ARM microcontroller. The system architecture and hardware design are presented, and the DSP encoding algorithm and the NAL interface for RTP/UDP/IP are implemented. Tests show that the system can perform high-quality networked video transmission. Keywords: H.264/AVC; embedded system; video surveillance; industrial Ethernet Currently, the analog video signal-based industrial television monitoring systems used in coal mines can no longer meet the technological development needs of integrated automation in coal mines. New video surveillance systems should ensure real-time video acquisition and compression while supporting IP transmission of video streams over industrial Ethernet, realizing the digitization and networking of information across the entire mine. H.264/AVC, as the latest video coding standard, defines the Video Coding Layer (VCL) and the Network Extraction Layer (NAL). Separating NAL from VCL in terms of framework structure mainly achieves two purposes: (1) It allows for the definition of the interface between VCL video compression processing and NAL network transmission mechanism, thus enabling the design of the video coding layer VCL to be ported to different processor platforms, independent of the data encapsulation format of the NAL layer; (2) VCL and NAL are designed to work in different transmission environments, and heterogeneous network environments do not require reconstruction and recoding of VCL bitstreams. Considering the advantages of H.264 in VCL and NAL, it is highly suitable for complex and diverse embedded application environments, improving network adaptability and thus ensuring the QoS of video transmission. Based on the above analysis, this paper proposes an embedded real-time networked video solution that integrates DSP and embedded ARM microcontroller. This solution adopts a fully embedded design, possessing advantages such as high reliability, small size, and strong environmental adaptability. It conforms to the IEEE 802.3u standard and supports direct transmission of digital video streams over coal mine industrial Ethernet while ensuring image quality. Figure 1 shows a video surveillance system based on industrial Ethernet. It uses three self-developed KJJ series explosion-proof industrial Ethernet switches (1, 2, and 3) to form a 100Mbps single-ring redundant industrial Ethernet network based on fiber optic transmission. The H.264 encoder device is connected to the switches using RJ45 twisted-pair cables with a 10/100Mbps adaptive speed. The ring network is connected to the mine's Intranet or the ground monitoring host via switch 4. Therefore, the embedded video encoder is crucial for the implementation of the video surveillance system. Figure 1 illustrates the H.264 encoder system design of the video surveillance system based on industrial Ethernet. Considering the characteristics of the coal mine application environment, its system structure design abandons the traditional computer expansion card design and adopts a fully embedded network design. As shown in Figure 2, based on TI's high-performance C6416 series fixed-point DSP chip and Samsung's embedded ARM-S3C4510B chip, an embedded video encoder system hardware platform is built, supporting online software upgrades through an embedded file system. The system mainly consists of four parts: video acquisition, video buffering, video processing, and video transmission. The analog video signal acquired by the CCD camera is converted into a digital video signal by an A/D converter. The video signal is then compressed into the required data stream by a high-speed DSP. The compressed video data stream is read from the DSP's data interface HPI32 by the S3C4510B. The embedded operating system's UDP/IP protocol stack packages the data and runs a WebServer, waiting for clients to access the data stream over the network, thus realizing a web-based embedded video compression encoding system. Figure 2 shows the system structure principle. The interface between the DSP and the microcontroller is implemented through the C6416's host interface HPI32. To ensure data throughput, a 32-bit bus is used. The C6416's host interface and PCI bus pins are shared, with PCI_EN set to 0, configuring it in HPI mode. This approach fully utilizes the DSP's bandwidth resources, reduces bus conflicts, and alleviates EMIF bus pressure. Because the DSP is a high-speed device and the S3C4510B is a low-speed device, the interface between the DSP and S3C4510B adopts an asynchronous slave mode, i.e., DSP (slave) and S3C4510B (master). This allows the DSP to operate in slave mode, eliminating the need for a shared memory module, saving development costs and reducing development difficulty. Other related control signal connections are shown in Figure 3. Since the S3C4510B has its own address and data buses, the address strobe signal /HAS is connected to a high level. The S3C4510B accesses the C6416 via the RCS5 memory partition (BANK) to transfer data. The S3C4510B is configured with 64ms DRAM in RAM partition 0, a 2MB BootROM, and an 8MB Intel FlashROM for the file system, located in ROM partitions 0 and 1 respectively. The VxWorks real-time operating system, including the TrueFFS file system and an embedded web server, runs on the S3C4510B. Figure 3 shows the C6416 and S3C4510 interface. The main function of this video acquisition module is to convert the analog video signal input from the CCD camera into a digital signal. As shown in Figure 4, the output of a typical CCD camera is either an NTSC (or PAL) composite full television signal (CVBS) or an S-Video signal, both of which are analog signals. The analog TV signal is decoded and converted from analog to digital to a CCIR.601 compliant digital video signal by Philips' TV decoding chip SAA7114H, and then stored in an AVERLOGIC dedicated video frame FIFO chip AL4V8M440 (8Mb) for DSP processing. The frame FIFO is configured in the CE2 space of the C6416. The CPLD generates the control signal for the FIFO AL4V8M440 based on the output status signal of the SAA7114H and the corresponding output control signal of the C6416 DSP. A DSP interrupt is generated by detecting the synchronization signal output by the SAA7114H, notifying the DSP to read the full frame image data from the video FIFO. Figure 4 shows the video acquisition module. Since the H.264 video format is primarily QCIF and CIF, QCIF specifies 176 pixels/line and 144 lines/frame for the Y signal, and 88 pixels/line and 72 lines/frame for the chrominance signals Cb and Cr, respectively, with each pixel represented by 12 bits. CIF specifies 352 pixels/line and 288 lines/frame for the Y signal, and 176 pixels/line and 144 lines/frame for the chrominance signals Cb and Cr, respectively, with each pixel represented by 12 bits. The default input bitstream in the H.264 standard is 4:2:0, so the acquired video data needs to be converted to a different storage format. This conversion ensures the video stream is stored in the FIFO in 4:2:0 QCIF or CIF format; this is referred to as video format conversion. The main processor module, TMS320C6416, is Texas Instruments' (TI) latest high-performance fixed-point DSP, with a clock frequency up to 600MHz and a maximum processing power of 4800MIPS. This DSP features a Viterbi decoding coprocessor (VCP) and a Turbo decoding coprocessor (TCP). It employs a two-level cache structure: the L1 cache consists of a 128KB program cache and a 128KB data cache, while the L2 cache is 8MB. It has two Extended Memory Interfaces (EMIFs) for seamless connection to asynchronous (SRAM, EPROM)/synchronous memories (SDRAM, SBSRAM, ZBTSRAM, FIFO), with a maximum addressable range of 1280MB. The Host Interface (HPI) bus width is user-configurable (32/16 bits). Due to the very large amount of image data to be processed (829440B for a 720*576 image) and the frequent data transfers in the system, using 64-bit SDRAM is very useful for improving overall efficiency. Therefore, a 128MB 64-bit SDRAM is configured on the A bus EMIFA, using partition CE0. The FLASHROM, used to store the program, is connected to partition CE1, using the 16-bit B bus EMIFB, totaling 8MB. The frame FIFO is configured in partition CE2 to store the pixels acquired by the A/D converter. Writing is controlled by the CPLD, and the DSP reads the data for compression processing. The specific process is shown in Figure 5. Figure 5. Main functions implemented by the system workflow: (1) Using the H.264/AVC standard, it can compress and encode digital video data; (2) It completes the control functions of each module of the entire hardware system and realizes the transmission control of digital video stream; (3) It transmits the compressed video stream to the microcontroller through the HPI32 bus of C6416; (4) It configures the video A/DSAA7114H and video FIFO through the McBSP1 analog I2C bus of C6416; (5) It has an audio expansion interface, which can easily form a network video monitoring system with audio and video synchronization. The H.264 NAL interface for RTP first classifies the relevant information of the video encoding stream according to the syntax priority. Based on the specific network transmission environment, it selects a data classification and integration packing strategy of appropriate size under the premise of ensuring the video error resistance capability. On this basis, H.264 provides NAL interfaces for RTP/UDP/IP and H.223 channel transmission. This system uses the NAL interface for RTP/UDP/IP. H.264 divides each coded frame or slice into two output packets with different transmission priorities within the MTU capacity limit: (1) Encoded information packet (high priority): TYPE_HEADER, TYPE_MBHEADER; TYPE_MVD, TYPE_EOS; (2) Texture information packet (low priority): TYPE_CBP, TYPE_2x2DC; TYPE_COEFE_Y, TYPE_COEFE_C; H.264/AVC Encoding and Key Technologies To meet the requirements of different rates, resolutions and network transmission, H.264 provides a variety of profiles and levels. Based on the H.264/AVC specification and combined with the characteristics of the information flow of the mine monitoring system, after testing and analysis, the encoding structure shown in Figure 6 is adopted. H.264 encoding mainly consists of several parts, including inter-prediction, intra-prediction, transition, quantization (Q), loop filtering, and entropy encoding. The bitstream generated by the encoder is submitted to the NAL layer. Figure 6 shows that after the input frame Fn of the H.264 encoder is processed by macroblocks (MC), the prediction value (P) is determined according to whether it is in inter-prediction or intra-prediction coding mode: in intra-prediction mode, the P value is determined by the previously encoded, decoded, and reconstructed fragment uF'n; in inter-prediction mode, the P value is determined by the inter-frame motion compensation prediction. In addition, H.264 uses the following key technologies: (1) In addition to supporting P-frames and B-frames, H.264 also includes inter-stream transmission frames—SP frames, which can quickly switch between bitstreams with similar content but different bitrates. It uses multiple reference frames for inter-frame predictive coding, with 1 to 5 reference frames, which saves 5% to 10% of code space compared to a single reference frame; (2) Inter-frame prediction can be based on 7 different block sizes, which improves the coding rate by more than 15% compared to the single 16×16 block prediction method; (3) H.264's motion estimation uses high-precision subpixel motion compensation, supporting motion estimation with 1/4 or 1/8 pixel precision. For QCIF video format, a 1/4 pixel precision prediction method is used; for CIF video format, a 1/8 pixel precision prediction method is used. (4) H.264 offers 32 different quantization step sizes, similar to the 31 quantization step sizes in H.263. However, in H.264, the step size increases with a 12.5% composite rate instead of a fixed constant. (5) H.264 uses an integer residual transform coding scheme based on 4×4 blocks, eliminating matching errors during the inverse transform process. (6) A deblocking filter based on 4×4 block boundaries is used to eliminate block artifacts, thereby greatly improving the subjective quality of the image. (7) H.264 employs two selectable entropy coding schemes: CAVLC (Content-Based Adaptive Variable Length Coding) and CABAC (Adaptive Binary Arithmetic Coding). The latter can improve the coding rate by approximately 10%. The DSP implementation and optimization code implementation of the H.264 algorithm: The core algorithm of H.264 provided by ITU-T requires not only improvements in code structure but also significant modifications to the specific core algorithm to meet real-time requirements. The specific tasks required include: removing redundant code, standardizing program structure, adjusting and redefining global and local variables, and adjusting structures. The development tool CCS has its own ANSI C compiler and optimizer, as well as its own syntax rules and definitions. Therefore, implementing the H.264 algorithm on the DSP requires modifying the H.264 code written in C on the PC to fully comply with the rules of C in the DSP. Related modifications include: removing all file operations; removing operations from the visual interface; rationally allocating and reserving memory space; standardizing data types—because the C6416 is a fixed-point DSP chip, it only supports four data types: short (16 bits), int (32 bits), long (40 bits), and double (64 bits). Therefore, the data must be re-standardized, approximating floating-point operations with fixed-point representations or implementing floating-point operations with fixed-point representations; defining near and far constants and variables based on memory allocation; extracting frequently used data from data structures and defining them as near type data in the DSP's internal storage space to reduce reads from the EMIF port and thus improve speed. H.264 DSP algorithm optimization combines the characteristics of the DSP itself to further optimize the algorithm and realize the real-time processing of video images by the H.264 algorithm. The following measures were taken: (1) By selecting the compilation optimization parameters provided by CCS such as -mw, -pm, -o3 and -mt, optimization was carried out according to the requirements of the H.264 system. By continuously selecting, matching and adjusting each parameter, the performance of loops and multiple loops was improved, thereby improving the parallelism of the software. (2) The key C code that was repeatedly called and affected the encoding speed was rewritten by linear assembly. Combined with the CCS code analysis tool, the key functions such as inverse integer transformation, 1/4 pixel interpolation and deblocking were rewritten by linear assembly. The function running clock cycle is only 1/2 to 1/3 of that of C language. (3) The original test model was trimmed and the H.264 encoding code was customized. Through actual test box performance analysis, the algorithm that had little impact on performance, such as peak signal-to-noise ratio calculation, was deleted. (4) Use intrinsics inline functions to optimize C programs. Inline functions directly replace complex C code, which helps reduce instruction cycles and improve code performance. (5) Use EDMA to achieve large-capacity data transfer. DMA speeds up data processing, thereby reducing CPU access and easing the processor's burden. (6) Make full use of the library functions provided by TI in network software implementation, EDMA data transfer, and timer usage. This helps improve performance and reduce code length. H.264 Encoder Explosion-proof Design and Performance Evaluation Due to the extremely harsh environment in coal mines, in addition to considering the electrical characteristics of the circuit board during hardware design, an explosion-proof enclosure is adopted for the encoder based on power consumption and on-site safety requirements. The casing design conforms to technical standards such as the "Coal Mine Safety Code," "Coal Mine Design Code," "Requirements for General Explosion-Proof Electrical Equipment for Explosive Atmospheres," "General Technical Requirements for Electrical and Electronic Products for Communication, Detection, and Control in Coal Mines," and "Requirements for Intrinsically Safe Circuits and Electrical Equipment for Explosion-Proof Electrical Equipment for Explosive Atmospheres." The product is required to pass more than 10 safety tests, including vibration, shock, water spray, damp heat, high and low temperature operation, and voltage fluctuation, and to operate safely in underground environments with explosive gases. A comparison of H.264, MPEG-4, and H.263++ encoding performance in a 10/100Mbps industrial Ethernet test environment shows that H.264 has superior PSNR performance compared to MPEG and H.263++. H.264's PSNR is on average 2dB higher than MPEG-4 and 3dB higher than H.263++. Furthermore, at the same encoding rate, the H.264-based encoding system produces clearer and smoother video, meeting the needs of the field. In conclusion, this paper proposes the design of an H.264 encoder for Ethernet applications in the coal mine industry using a digital signal processor and an embedded network microcontroller, thereby constructing a networked video surveillance system based on industrial Ethernet. This research also examines key technologies for the networking of coal mine video surveillance systems. This is beneficial for promoting the informatization and networking of coal mining enterprises and building an integrated IP-based management and control network.

H.264 Coal Mine Video Surveillance System Based on Industrial Ethernet

Read next

CATDOLL 102CM B04 TPE Doll with Anime Head

CATDOLL 138CM Tami (TPE Body with Hard Silicone Head)

CATDOLL 139CM Charlotte (TPE Body with Soft Silicone Head)

CATDOLL 123CM Nanako (TPE Body with Soft Silicone Head)