Optimized Design and Hardware Implementation of Embedded TCP/IP

With the rapid development of computer network technology and electronic information technology, the use of the Internet is becoming increasingly widespread. More and more non-PC devices, such as information appliances and smart meters, are able to access the Internet, leading to a growing demand for connecting various electronic devices to the Internet. There are several solutions for connecting electronic devices to the Internet: running a customized TCP/IP protocol stack on a 51 series microcontroller; using ASIC chips that implement TCP/IP, such as the Internet Modem from Analog Devices; or using the TCP/IP protocol stack built into an embedded operating system. In some fields where network speed requirements are not high, a microcontroller can be used to implement TCP/IP; in situations with high performance requirements, the latter two solutions can be chosen. 1. Hardware Structure of Embedded TCP/IP Figure 1 shows the hardware structure of an embedded TCP/IP system. The CS8900A is a network controller from Cirrus Logic. The chip has an internal frame filter that automatically discards invalid frames, reducing CPU load and improving CPU efficiency in accessing the network. The CS8900A's working mechanism mainly involves setting the values of its internal registers, after which it can automatically start working. In the network interface section, since it's an RJ45 interface, an E2023 transmission line transformer is required to convert the network signals. Typically, the TCP/IP protocol stack requires a significant amount of RAM to store TCP packets that need to be acknowledged. If no acknowledgment is received within a specified time, the TCP packet is retransmitted; once acknowledged, it is released. To reduce RAM usage, when a data packet needs to be retransmitted, if the data required to regenerate the packet can be generated, the TCP packet that needs to be acknowledged does not need to be stored. Because there is a large amount of data in the network, reading all the data into memory before determining whether to discard a frame would be inefficient. Therefore, the decision is made while reading the data, rather than reading the entire frame into memory at the beginning. The relative addresses of each part of the frame are defined in the program, allowing for convenient addressing of each byte of the frame. This design is based on improving access speed. In the CPU, the PacketRAM variable is defined to store the starting address of the frame. Figure 2 shows the memory allocation for TCP/IP in the CPU, as well as the definition and relative position of each byte of the frame in memory. 2. Optimized Design of Embedded TCP/IP TCP/IP generally uses C language or a mixture of assembly and programming languages. Using reentrant functions and general pointers (Gellerc pointers) increases program code size and reduces execution speed; when using function pointers, the call tree needs to be manually rebuilt, or the functions called by the function pointers need to be made reentrant. 2.1 Embedded TCP/IP Input/Output Flow Similar to the PC TCP/IP protocol, embedded TCP/IP adopts a layered protocol structure: application layer, TCP layer, IP layer, and network device interface layer. Figure 3 illustrates the flow of input and output data packets and the functions that need to be called. During output, the TCP layer first checks the unsend queue; if it is not empty, it inserts the data packet into the queue; if it is empty, it checks if the other party's window is large enough (to receive the data packet). Then, it fills in the TCP header information. The IP layer selects the network device interface, checks if the destination IP and the subnet mask of that interface are ANDed to match the subnet mask, and then calls the Output function of that interface to send the packet. During input, the Timer() function calls the Input function of each interface. The IP layer determines the IP version, IP checksum, and whether the data packet should be forwarded. Then, based on the protocol field in the IP header, it passes the packet to the appropriate higher-level processor. The TCP layer checks the TCP checksum and searches the existing sockets to see if any can receive the packet. It also checks if the TCP sequence number is correct, updates the connection state (including releasing acknowledged packets and TCP state machine transitions), and calls the socket's callback function `recv`. 2.2 Embedded TCP/IP Program Structure The `Timer` function calls `TCPTimer` to handle TCP packet retransmission and other functions, and calls the `Input` function of each interface to receive arriving packets. The `Timer` function must be called once within a short period (typically 20ms); otherwise, packet reception and TCP timing functions will stop. As shown in Figure 4, the main program flow is a large loop. While processing application layer protocols such as sending data packets within the loop, it queries the variable `bTimeout` and sets `bTimerOut` to true in the timer interrupt. The application layer repeatedly checks if bTimerOut is true in the program flow. If it is true, it calls Timer() and then sets bTimerOut to false. When using an embedded operating system, attention must be paid to the issue of network device driver functions being re-entrant. Taking the NE2K Ethernet card as an example, registers (such as the starting address) must be set before copying data packets to the network card buffer. If an interrupt occurs after setting the registers and re-entry occurs, the register settings will be modified, and the copying will fail after the interrupt returns. 2.3 Optimization of Embedded TCP/IP Running Speed The main computational load during TCP/IP transmission is concentrated in three parts: the application copies data to RAM; calculates the TCP checksum; and copies the data packets from RAM to the network device's transmit buffer. For each byte of data, the two copies take approximately 12 × 2 = 24 instruction cycles; calculating the TCP checksum takes 16 instruction cycles. Using a 12MHz crystal oscillator, the maximum network transmission speed is 25KB/s. To improve speed, a faster CPU or a higher crystal oscillator frequency can be used. Additionally, the Reentrant function should be avoided as much as possible. Functions of type Reentrant are much slower than regular functions, but sometimes Reentrant is necessary for program structure, requiring a choice between speed and structure. Methods for choosing between these two types include: using memoryr-specific pointers; streamlining the protocol stack by removing computationally intensive but less useful functions (currently, TCP retransmission times are fixed, and there is no congestion window control or IP layer routing); preventing unnecessary packet copying; and optimizing checksum calculation and memory copy functions. 3. Embedded Implementation of TCP/IP TCP/IP protocol implementations are generally embedded in ROM in software, then connected to a dedicated embedded gateway via network communication technology to run the TCP/IP protocol and provide lightweight network connection and routing functions from TCP/IP to the user. 3.1 Memory Management Methods and Implementation Without Redundant Packet Copying Embedded TCP/IP memory management can use a linked list method, allocating memory blocks of appropriate size according to the packet size. As shown in Figure 5, the linked list links memory blocks together. The `used` field indicates whether the memory block is in use, and `pStart` and `pEnd` represent the start and end addresses of the valid data in the data portion. During allocation, the memory list is searched to find an unallocated memory block larger than the required space, and the required size is truncated. After truncating, there may still be a significant amount of memory remaining. This remaining portion is then separated from the original memory block to form a new memory block, which is inserted into the linked list. During release, `used` is set to false. If the linked list unit pointed to by `pNext` or `pPre` is also free, it is merged with itself to prevent memory fragmentation. Transmitting data packets between protocol layers only requires transmitting the starting address of the memory block. This memory management method has minimal space waste but relatively high computational cost. 3.2 Implementation of Reordering, Retransmission, and Window Control: Reordering, retransmission, and window control are implemented using a queue-based buffering approach. Each element in the queue points to a data packet, and the maximum length of the queue is unlimited. For reordering, the `ooSeq` queue is used. If the sequence number of a received TCP packet is not the desired one, but it is within the receive window, the packet should not be received immediately nor discarded; instead, it should be placed in the `ooSeq` queue. Once a desired TCP packet is received, the ooseq queue is checked to see if any TCP packets have become the desired packets. If so, they are removed and processed. For retransmissions, the unacked queue is used. Each TCP packet requiring an acknowledgment is placed in the unacked queue after transmission and is only removed after being acknowledged. The TCP retransmission timer only applies to the first TCP packet in the unacked queue. If the timer expires, the packet is retransmitted; if the number of retransmissions exceeds the specified value, an error is reported. For window control, the unsend queue is used. If the recipient's window is found to be too small to receive the packet, only part of the data is sent, and the excess is placed in the unsend queue. The system waits for the recipient to send a TCP packet indicating a new window size before re-evaluating whether transmission is possible. All packets to be sent should be inserted into the unsend queue when it is not empty. 3.3 Implementation of Piggyback Acknowledgment Piggyback acknowledgment refers to the practice of not immediately acknowledging a TCP packet requiring an acknowledgment upon its arrival, but waiting for a short period. If data is transmitted during this period, an acknowledgment is piggybacked, thus reducing the number of packets sent. If there is no data to send or the data is not yet ready, wait for a certain period of time. If the data is ready within that time, piggybacking acknowledgment can be used. Using piggybacking acknowledgment means it's impossible to acknowledge every frame; acknowledgment of a specific frame can replace acknowledgment of all frames preceding it. 4. Summary Embedded systems largely consist of 8/16-bit low-speed processors. Due to resource limitations, implementing a complete TCP/IP protocol is difficult when accessing the Internet. This paper, from the perspective of achieving the required functionality while saving system resources, proposes a targeted modular design and optimization of the protocol. This allows embedding a TCP/IP protocol suite on a microcontroller/ARM processor to achieve embedded Internet access. The optimized embedded TCP/IP supports multiple TCP connections in socket form; supports multiple network devices; supports sending and forwarding data packets through a gateway; responds to ping commands; and supports TCP packet reordering, retransmission, window control, and flow control. Practice has proven that this design is flexible and can implement many complex functions according to user needs.

Optimized Design and Hardware Implementation of Embedded TCP/IP

Read next

CATDOLL 146CM Christina TPE (Customer Photos)

CATDOLL Amber Hybrid Silicone Head

Green lighting and intelligent lighting energy-saving control system

CATDOLL 139CM Ya (TPE Body with Soft Silicone Head)