1 Introduction
With the development of embedded technology, the demands for system intelligence and miniaturization are increasing. ARM-based microprocessors, with their advantages of high performance, low power consumption, and low cost, are widely used in various electronic products, especially in high-end embedded control applications such as mobile phones, industrial control, and network communication. ARM technology boasts excellent performance and efficiency, and its partners include many of the world's leading semiconductor companies. It can be said that ARM technology is virtually ubiquitous. The TCP/IP Internet Protocol suite has become the protocol for open systems interconnection worldwide, providing excellent interoperability and compatibility with various network technologies. The combination of embedded technology and TCP/IP technology has shown strong momentum and enormous market potential. How to develop efficient code for ARM, especially improving the execution efficiency of fundamental software modules such as the TCP/IP protocol stack, has become a crucial question for every developer working on ARM-based embedded systems .
2. Application areas and characteristics of ARM microprocessors
As a high-performance, low-cost, and low-power embedded RISC microprocessor, the ARM microprocessor has become the most widely used embedded microprocessor.
2.1 Application Areas of ARM Microprocessors To date, the application of ARM microprocessors and technology has penetrated into almost every field.
(1) Industrial control field: As a 32-bit RISC architecture, ARM core-based microcontroller chips not only occupy most of the high-end microcontroller market share, but also gradually expand into the low-end microcontroller application field. The low power consumption and high cost performance of ARM microcontrollers pose a challenge to traditional 8-bit/16-bit microcontrollers.
(2) Wireless communication field: Currently, more than 85% of wireless communication devices have adopted ARM technology. ARM has an increasingly solid position in this field due to its high performance and low cost.
(3) Network Applications: With the promotion of broadband technology, ADSL chips using ARM technology are gradually gaining a competitive advantage. In addition, ARM has been optimized for voice and video processing and has gained widespread support, which also poses a challenge to the application areas of DSPs.
(4) Consumer electronics: ARM technology is widely used in currently popular digital audio players, digital set-top boxes and game consoles.
(5) Imaging and security products: Most popular digital cameras and printers now use ARM technology. The 32-bit SIM smart cards in mobile phones also use ARM technology. In addition, ARM microprocessors and technology are applied in many different fields and will be even more widely used in the future.
2.2 Characteristics of ARM microprocessors
(1) Adopting a RISC architecture;
(2) Small size, low power consumption, low cost, and high performance;
(3) Supports both Thumb (16-bit) and ARM (32-bit) instruction sets, providing excellent compatibility with 8-bit and 16-bit devices;
(4) Extensive use of registers results in faster instruction execution;
(5) Most data operations are performed in registers;
(6) The addressing mode is flexible and simple, and the execution efficiency is high;
(7) The instruction length is fixed.
2.3 Instruction Structure of ARM Microprocessors
ARM microprocessors in newer architectures support two instruction sets: the ARM instruction set and the Thumb instruction set. ARM instructions are 32 bits long, while Thumb instructions are 16 bits long. The Thumb instruction set is a subset of the ARM instruction set, but compared to equivalent ARM code, it can save 30% to 40% or more of storage space while retaining all the advantages of 32-bit code.
3 Embedded System Development and Design
The development process for embedded systems is largely similar to that of high-level development: coding, compiling, linking, and running. Of course, there can be recursive processes such as online debugging and recoding. However, there are some differences.
First, the development platforms differ. Due to the limited processing power of embedded platforms, embedded development generally employs a cross-compilation environment. Cross-compilation means compiling a target program that runs on platform B on platform A. The compiler for platform B programs that run on platform A is called a cross-compiler. For a beginner, setting up such a compilation environment might take several days.
Secondly, the debugging methods differ. Programs developed on Windows or Linux can be run immediately to view the results, and the running process can be debugged using an IDE. However, embedded developers need to perform at least a series of steps to achieve this. Currently, the most popular method is to use JTAG to connect to the target system, download and run the compiled code, and advanced debuggers can debug programs almost as arbitrarily as in a Visual C++ environment.
Furthermore, the hierarchical understanding of developers differs. High-level software developers focus on understanding and implementing application requirements. Embedded developers, on the other hand, must have a deeper understanding of the details of the entire process than high-level developers. The biggest difference lies in the fact that programs supported by an operating system do not require you to care about the program's runtime address or the final location of each program block after linking. Operating systems like Windows and Linux, which require MMU support, place programs at a fixed memory address in the virtual address space. Regardless of the program's actual address in RAM, it is ultimately mapped to a fixed address in the virtual address space by the MMU. Why is the program's execution related to its storage address? Anyone who has studied assembly language or seen the final compiled machine code knows that variables and functions in a program are ultimately represented as addresses in the machine code. Program jumps, subroutine calls, and variable calls are all implemented by the CPU directly extracting their addresses. The text_base specified during compilation is the reference value for all addresses. If the address you specify is inconsistent with the final program location, it obviously will not run correctly. However, there are exceptions, but unusual usage naturally requires unusual effort. There are two ways to solve this problem. One approach is to write address-independent code at the very beginning of the program, and then move the rest of the program to the actual `text_base` you specify before jumping to the code you're about to execute. Another approach is to specify the `text_base` as the storage address of your program, then move the program to that address, using a variable to record the latter's address as a reference value. Subsequent symbol table addresses will use this value as a reference, combined with an offset, to form their actual addresses. This sounds convoluted and is difficult to implement; a better solution—using a bootloader—is discussed later. Furthermore, a complete program must have at least three segments: a `text` segment (the main body, i.e., the machine instructions compiled from the program), a `bss` segment (uninitialized variables), and a `data` segment (initialized variables). The `text_base` mentioned earlier is only the base address of the `text` segment. For the `bss` and `data` segments, if the entire program is stored in RAM, the three segments can be placed contiguously. However, if the program is stored in ROM or flash memory (read-only memory), you need to specify the addresses of the other segments, because the code itself doesn't change during execution, unlike the latter two. These tasks are all completed during linking, and the compiler inevitably provides methods for you to perform them. Again, operating system-supported programming shields you from these details, freeing you from these headaches. However, embedded developers are not so lucky; they always start from scratch on a cold, hard chip. The CPU, upon power-on reset, always searches for the program at a fixed address and begins its busy work. For our PCs, this address is our BIOS program. For embedded systems, there is generally no BIOS support, and RAM cannot retain your program during power loss, so the program must be stored in ROM or flash memory. However, generally speaking, the width and speed of these memories cannot compare with RAM. Running programs in these memories will reduce the running speed. Most solutions store a bootloader here. The functions of the bootloader can vary greatly. A basic bootloader only performs some system initialization, moves the user program to a certain address, and then jumps to the user program, relinquishing CPU control. More powerful bootloaders can also support network, serial port downloads, and even debugging functions. However, don't expect a universal bootloader like a PC BIOS to be available for you. At the very least, you'll need to do some porting work to make it compatible with your system. This porting work is also part of your development process. For beginners in embedded development, porting or writing a bootloader will be extremely beneficial. Is it possible to run without a bootloader? Of course, you can. Either you sacrifice efficiency and run directly from ROM, or you write your own program and move the code to RAM. Most importantly, you need good debugging tools to support online debugging during development; otherwise, you'll have to re-flash the chip for verification even if you change a single variable. Continuing with the topic of program entry points, regardless of the process, a program ultimately becomes machine instructions during execution. A pure executable program is a collection of these machine instructions. Runnable programs on operating systems are not pure executable programs; they have a format. Generally, in addition to the segments mentioned above, they also include the program length, checksums, and the program entry point—where the user program begins execution. Why do we need a program entry point even if we already have the program address? This is because the code you actually execute doesn't necessarily have to be placed at the very beginning of a file. Even if it is, unless you control the linking, the compiler might not place that code at the very top of the last program in a multi-file environment. For programs supported by the operating system, you only need a `main` function as the program entry point—note that `main` is just a conventional entry point for most compilers; unless you're using someone else's initialization library, you can set the entry point yourself—and that's sufficient. Obviously, executable files with a specific format are more flexible, but require bootloader support. For information on executable file formats, see the ELF file format.
4. Application of ARM-oriented program optimization in embedded TCP/IP protocol implementation
The authors used the Atmel AT91RM9200 microprocessor, along with the Ethernet physical layer driver chip (DM9161), to construct a network-oriented embedded system hardware platform, as shown in Figure 1. On this platform, embedded TCP/IP protocol processing based on the ARM microprocessor was implemented.
Figure 1. Block diagram of network-based embedded system hardware platform
ARM-based embedded systems directly interface with Ethernet data, and a typical Ethernet data encapsulation format is shown in Figure 2. Based on the optimization methods described above, optimal memory layout needs to be considered when defining variables to ensure that variables of various types are aligned using a 32-bit space base. Data participating in calculations within function calls should be processed using 32-bit data whenever possible.
Figure 2 Typical Ethernet data encapsulation format
Embedded TCP/IP protocol implementations typically utilize the TCP/IP network architecture layer in Linux. The TCP/IP protocol implements ARP/RARP, IP, ICMP, TCP, UDP, and other protocols at the network and control layers, directly supporting application layer protocols such as HTTP, SMTP, FTP, and Telnet. Each system needs to define the interface between the application layer program and the protocol software. The general protocol processing flow is shown in Figure 3. Multiple conditional checks are required during protocol processing; loop comparisons for IP address and TCP data verification and processing are unavoidable. Therefore, the program design can be optimized by fully utilizing conditional checks involving comparisons with zero and loops involving subtraction to zero.
Figure 3. Protocol processing flowchart
5. Conclusion
This paper first introduces the architectural characteristics of the ARM microprocessor, then discusses the development of embedded systems. Next, considering the characteristics of TCP/IP networks, it presents the design of an ARM-based embedded system and outlines the corresponding workflow. In practical applications, optimizing ARM instructions is also crucial. A thorough understanding of the characteristics of ARM assembly instructions and the compilation process, coupled with the appropriate application of program optimization principles and methods, can effectively improve compilation and code execution efficiency.