PCI bus high-speed data transmission system design
2026-04-06 06:37:25··#1
Abstract: This paper introduces the basic process of PCI bus data transmission through the design of a data acquisition card, presents the overall system design scheme, PCI interface communication method and driver implementation, and focuses on discussing various aspects affecting the transmission rate in PCI data transmission. Keywords: PCI bus, WinDriver, Direct Memory Access 1 Introduction Computer bus expansion technology has enabled the rapid application of computer-based data acquisition technology. PC-based data acquisition systems are among the most widely used data acquisition systems, not only in power equipment monitoring, telemetry and remote sensing, but also in sonar, radar, communication, geology, medical devices and other fields. High-speed data transmission has always been a key research focus in computer-related fields, as it is the foundation for computers to perceive and control the external world. The PCI bus protocol, proposed by Intel in 1992, is a low-cost, high-performance local bus protocol designed to meet the requirements of high-speed data input/output. The peak transmission rate of the 32-bit PCI bus is 132MB/s (33MHz×4Byte). In PC data transmission, the PCI bus has been widely used due to its excellent performance. High-speed data transmission requires improving the bandwidth utilization of the PCI bus to bring it as close as possible to its maximum speed. This paper details the key technologies in the design of a high-speed data transmission system for the PCI bus and implements a high-speed data acquisition system. 2. Hardware Circuit Design The block diagram of the hardware circuit board for this system is shown in Figure 1. The PCI9054 bus controller is responsible for the interface communication between the local bus and the PCI bus. Since the timing of the signals on the PCI9054 Local bus is relatively complex, using gate circuits is not only difficult but also requires a large number of chips, affecting the scalability and versatility of the circuit board. Therefore, a CPLD is used to control the PCI9054 Local side. As shown in the block diagram, the high five data lines of the PCI9054 local data bus are connected to the CPLD pins. These five data lines can receive and send data, decode the received data to generate control information, and also send data to cause the PCI9054 to generate doorbell and Mailbox interrupts. The PCI interface has very high data transmission efficiency, while A/D data acquisition efficiency is relatively low. Therefore, this design uses a FIFO to buffer data and can also acquire multiple data streams simultaneously. The PCI9054 local bus supports a 50MHz clock, but a 40MHz clock is selected for this design to coordinate with the A/D converter. All address lines and control signals of the PCI9054 are connected only to the CPLD, generating the CPLD's operating status and various control information. The FIFO uses the TI SN74V3690 chip, which is well-suited for networking, video, signal processing, telephone, data communication, and other applications requiring large amounts of data and non-peer-to-peer bus matching. [align=center] Figure 1 Hardware Circuit Block Diagram[/align] 3 DMA Transfer Mode Design 3.1 Improved State Machine Design The state machine transition diagram is shown in Figure 2. This state machine has three transfer states: idle state, waiting state, and transfer state. This state machine can successfully complete data transfer with the PCI9054, with stable transfer performance and good results. This is the general method for current PCI9054 local bus state machine design. Through actual testing, in user mode, using this state machine polling method (DM), the A transfer rate can stably reach 68 M Byte/s. [align=center]Figure 2 General State Transition Diagram[/align] The peak rate of the PCI bus is 132 M Byte/s, while the DMA rate in the above state machine and transfer state is only 68 M Byte/s, indicating that there are many loopholes. Comparing the timing of this state machine with the standard DMA transfer timing given by PCI9054, it can be seen that the general state machine enters a waiting state after the ADS# signal goes low to configure various signal lines before starting the transfer state. Therefore, each time a transfer is started, a clock cycle is wasted, which is considerable in high-speed data transfer, especially in DMA transfer. However, the PCI9054 standard DMA transfer starts data transfer immediately after the ADS# signal goes low without a waiting cycle, so the waiting cycle can be omitted. The state machine only needs two states to generate the timing, namely the idle state and the data transfer state, thus simplifying the state machine transition process. However, since the PCI9054 requires not only DMA transfer but also other operations, such as reading and writing certain control words and status flags, the CPLD also needs to distinguish addresses. Therefore, this design adopts a new state machine, and the improved state machine state transition diagram is shown in Figure 3. In actual measurements on the same machine, the new state machine in user mode can stably reach a DMA transfer rate of 88 MByte/s using polling mode. Therefore, under the same conditions, the improved state machine rate can be 20 MByte/s higher than the original state machine rate. The significant increase in transfer rate shows that this improvement method is very successful. [align=center] Figure 3 Improved state transition diagram[/align] 3.2 DMA Transfer Mode Selection The PCI9054 supports two DMA transfer modes: continuous mode and distributed mode. Continuous mode is the general DMA transfer mode (block transfer), which requires the physical memory address of the PCI end to be continuous, and the local end address to be continuous or unchanged (such as when reading data from a FIFO, the address always remains unchanged). Distributed DMA transfer can allocate a linked list-style descriptor table to store the descriptor register values of multiple data blocks with non-contiguous physical addresses and variable transfer sizes. The PCI9054 can automatically read the register values from the PCI or LOCAL terminals via the control register pointer and configure the registers to initiate transfers. Distributed DMA transfer allows for the transfer of more data at once. However, distributed DMA transfer requires reconfiguring register values after transferring a block of data, making it slightly slower than continuous mode. This design compared the two DMA transfer modes. Under the same conditions, the DMA transfer rate in continuous mode was measured to be 95.3 M Byte/s, while the DMA transfer rate in distributed mode was 91.2 M Byte/s. It can be seen that continuous mode is 4 M Byte/s faster than distributed mode under the same conditions. However, continuous mode can only initiate DMA once, limiting the amount of data that can be transferred, which is a significant disadvantage for large-scale data transfers. Because the distributed mode can reconfigure the registers through the descriptor table during the transmission process, it can transmit a larger amount of data at once. Moreover, the reconfiguration of the registers does not require driver intervention, and the PCI9054 will automatically complete the process. Therefore, a larger amount of data can be obtained with only a slight reduction in transmission speed. Distributed mode transmission is a feature mode of DMA transmission supported by PCI9054. In actual measurements, on a Celeron 1GHz machine with 256M memory, under the same conditions, the continuous mode can transmit up to 4MByte of data at once, while the distributed mode can transmit up to 32MByte at once. The amount of data that can be transmitted in the distributed mode is much greater than that in the continuous mode. Therefore, the distributed mode is a more ideal choice when the amount of data to be transmitted at once is large and the speed requirement is relatively less strict. 3.3 Termination method selection The termination of the DMA process can be detected by two methods: polling and interrupt. The principle of polling mode DMA transmission is to continuously read the value of register DMACSR0[4] after DMA is started. When the bit is 1, it indicates that the DMA transmission is complete. Interrupt-based DMA enables DMA completion interrupts when DMA is started. Upon receiving an interrupt, the system clears the interrupt in the interrupt handler and reads the data transferred from DMA. Under the same conditions, the continuous DMA transfer rate under polling mode is 95 MByte/s, and under interrupt mode, it is 114.2 MByte/s. Comparing the speeds of the two termination methods, it can be seen that interrupt mode is nearly 20 MByte/s faster than polling mode under the same conditions. The PCI bus's maximum speed is 132 MByte/s, while this transfer method already reaches 114 MByte/s, so this DMA speed is close to the peak speed of the PCI bus. 4. Driver Design 4.1 Driver Introduction In the broadest sense, a "driver" is a set of functions that operate a hardware device. On any operating system, device drivers must work harmoniously with the underlying system code. To ensure system security, stability, and portability, Microsoft restricts application access to hardware resources, prohibiting direct access to physical memory, I/O ports, and interrupt handling. 4.2 Developing Device Drivers Using WinDriver 1) Driver Development Environment Selection There are many software platforms for developing device drivers, such as DDK, VTOOSD, WinDriver, DriverStudio, etc. This design uses WinDriver to develop device drivers. WinDriver provides excellent support for common PCI interface chips (AMCC, PLX series) and has dedicated functions for developers, facilitating driver development. 2) Introduction to User-Mode Drivers in This Design WinDriver drivers can be divided into two working modes: user mode and kernel mode. The functions of the driver in user mode in this design include: PCI configuration register access, access to various registers mapped to local space (including local control registers, Runtime registers, DMA registers, and message queue registers), selection of various PCI devices, EEPROM access, IO operations, access to various spaces mapped to LOCAL, DMA operations, interrupt handling, etc. 3) Kernel Driver Design in This Design Data acquisition requires the transmission of large amounts of data, necessitating rapid transmission of the acquired data to the host. The shorter the data transmission time, the more time the application can utilize, thus placing higher demands on the communication and data transmission between the data acquisition card and the host. The key issues in writing this device driver are: fast and reliable DMA transfer and timely interrupt handling. WinDriver provides Kernel PlugIn technology, which can achieve ideal interrupt response speed and solve the high-performance requirements. To improve the overall system performance, Kernel PlugIn technology was adopted on the basis of the user-mode driver. Interrupt and DMA processing were transferred to kernel mode for execution, which greatly improved the system's processing speed. 4) Discussion of DMA Rate System data transfer not only requires data transfer, but also requires interrupts to trigger data transfer and transfer termination calls. Transfer time = Interrupt time + DMA transfer time By viewing the hardware waveform diagram with a logic analyzer, it can be calculated that the interrupt-triggered DMA transfer rate in kernel mode is 105.4 MByte/s. Therefore, the continuous data transfer rate can reach 105 MByte/s. It can be seen that the interrupt response time also affects the final data transfer speed. As discussed in Chapter 4 DMA section, the fastest speed measured during DMA transfer in user mode is 95 MB/s. The fastest DMA speed in kernel mode is 114 MB/s. The initial DMA transfer speed is 68 MB/s. From the beginning to the present, the transmission speed during DMA transmission has increased by 46MB/s. Conclusion In modern signal processing technology, fast data transmission via PCI bus is the main way to realize communication between host and peripheral devices. The author designed a high-speed signal acquisition device with 12 channels using PCI9054 interface chip in the laboratory. Considering each link affecting the data transmission rate, a complete data transmission scheme was proposed. The continuous data transmission rate was increased to more than 100MByte/s. Approaching the limit rate of PCI bus. The innovation of this paper is: the state transition of the local bus state machine was studied. The local bus state machine adopts improved state transition logic, which improves the DMA data transmission efficiency by about 20MByte/s. References: [1] Pei Xilong, Tong Li. Design and implementation of high-speed data acquisition card system based on PCI bus [J]. Microcomputer Information, 2006, 7-1: 129-131. [2] WinDriver v6.23 User's Guide [DB/OL]. http://www.jungo.com. 2004. [3] PCI9054 Data Book [DB/OL]. http://www.plxtech.com. 2000-01. [4] Li Guishan, Chen Jinpeng (eds.). PCI Local Bus and Its Applications. Xi'an: Xi'an University of Electronic Science and Technology Press [M]. 2003. [5] Hou Boheng, Gu Xin (author). VHDL Hardware Description Language and Digital Logic Circuit Design. Xi'an: Xi'an University of Electronic Science and Technology Press [M]. 1997. [6] Qiao Lin, Yang Zhigang (author). Visual C++ Advanced Programming Technology: DirectX Chapter. Beijing: China Railway Publishing House [M]. 1998. [7] Hu Bo, Yuan Xinjing. Research and Application of KernelPlugIn Technology for WinDriver Driver Development [J]. Computer Applications, 2003, 23 (11).