A Brief Analysis of Optimization Issues in Embedded Programming

Due to limitations in power consumption, cost, and size, embedded systems suffer from a significant gap in processing power between embedded microprocessors and desktop processors. Consequently, embedded systems face more stringent requirements regarding program execution time and space. Therefore, performance optimization of embedded applications is typically necessary to meet the performance demands of embedded applications. 1. Types of Embedded Program Optimization Embedded application optimization refers to modifying the original program's algorithm and structure, and using software development tools to improve the program without altering its functionality, resulting in faster execution or smaller code size. Depending on the focus of optimization, program optimization can be divided into speed optimization and code size optimization. Speed optimization, based on a thorough understanding of hardware and software characteristics, involves adjusting the application structure to shorten the execution time required to complete a specified task. Code size optimization, on the other hand, aims to minimize the amount of code while still correctly implementing the required functions. In practice, these two approaches are often contradictory: increasing program speed may come at the cost of increased code size, while reducing code size may come at the cost of slower execution. Therefore, before optimizing a program, a specific optimization strategy should be developed based on actual needs. With the continuous development of computer and microelectronics technology, storage space is no longer the main factor restricting embedded systems. Therefore, this paper mainly discusses the optimization of running speed. 2 Principles of Embedded Program Optimization Embedded program optimization mainly follows the following three principles. ① Equivalence principle: The functions implemented by the program before and after optimization are consistent. ② Effectiveness principle: The running speed after optimization should be faster or the storage space occupied should be smaller than before optimization, or both. ③ Economic principle: The optimization program should achieve better results at a lower cost. [b]3 Main Aspects of Embedded Program Optimization[/b] The optimization of embedded programs is divided into three aspects: algorithm and data structure optimization, compilation optimization, and code optimization. 3.1 Algorithm and Data Structure Optimization Algorithms and data structures are the core of program design. The quality of the algorithm largely determines the quality of the program. In order to achieve a certain function, multiple algorithms can usually be used. The complexity and efficiency of different algorithms vary greatly. Choosing an efficient algorithm or optimizing the algorithm can make the application application achieve higher optimization performance. For example, in data search, binary search is faster than sequential search. Recursive programs require numerous procedure calls and store all local variables of the returned procedures in the stack, resulting in very low time and space efficiency. However, by using iteration, stack manipulation, or other methods to convert recursive programs into non-recursive ones, performance can be significantly improved. Data structures also play a crucial role in program design. For example, if multiple insertions and deletions of data items are performed in unordered data, a linked list structure is faster. Algorithm and data structure optimization are the preferred optimization techniques. 3.2 Compiler Optimization Many compilers now possess code optimization capabilities. During compilation, parallel programming techniques are used to perform dependency analysis; semantic information of the source program is obtained; and software pipelines, data planning, and loop refactoring techniques are employed to automatically perform optimizations independent of the processor architecture, generating high-quality code. Many compilers offer different levels of optimization options, allowing for the selection of a suitable optimization method. Generally, if the highest level of optimization is selected, the compiler will unilaterally pursue code optimization, sometimes leading to errors. Furthermore, some dedicated compilers are optimized for certain architectures, fully utilizing hardware resources to generate high-quality code. For example, the Intel compiler for Microsoft eMbedded Visual C++ is designed for the Intel XScale architecture and is highly optimized to create code that runs faster. This compiler uses a variety of optimization techniques, including scheduling techniques for optimizing instruction pipeline operations, support for dual loading and storage Intel XScale technology, and inter-procedural optimization (storing variables used by functions in registers for fast access). In the process of embedded software development, a compiler with strong optimization capabilities should be selected to make full use of its code optimization functions to generate efficient code and improve the running efficiency of the program. 3.3 Code optimization Code optimization is to replace the original code with assembly language or more concise program code to make the compiled program run more efficiently. The compiler can automatically complete the optimization within the scope of program segments and code blocks, but it is difficult to obtain program semantic information, algorithm flow and program running status information, so programmers need to perform manual optimization. The following are some commonly used optimization techniques and skills. (1) Code replacement Use short-cycle instructions to replace long-cycle instructions to reduce the intensity of operations. ① Reduce division operations. Use relational operators to multiply the divisor on both sides to avoid division operations, and some division and modulo operations can be replaced by bit operations. Because bitwise operations require only one instruction cycle, while the "/" operation requires calling a subroutine, resulting in longer code and slower execution. For example: Before optimization: if ((a/b)>c) and a=a/4 After optimization: if (a>(b*c)) and a=a>>2 ② Reduce exponentiation operations. For example: Before optimization: a=pow(a, 3.0) After optimization: a=a*a*a ③ Use addition and decrement instructions. For example: Before optimization: a=a+1, a=al After optimization: a++, a-- or inc, dec ④ Use smaller data types as much as possible. Given that the defined variables meet the usage requirements, the priority order is: character (char) > integer (im) > long integer (long int) > floating-point (float). For division, using unsigned numbers is more efficient than signed numbers. In actual calls, minimize data type casting; use floating-point operations less often, and if the result can be controlled within the error range, long integers can be used instead of floating-point types. (2) Use global variables and local variables less and use local variables more. Global variables are stored in data memory. Defining global variables reduces the amount of data memory space available to the MCU. Too many global variables will lead to insufficient memory allocation by the compiler. Local variables are mostly located in the registers inside the MCU. In most MCUs, register operations are faster than data memory operations, and instructions are more flexible, which is conducive to generating higher quality code. Moreover, the registers and data memory occupied by local variables can be reused in different modules. (3) Use register variables When a variable is frequently read/written, it requires repeated access to memory, which takes a lot of access time. To improve access efficiency, CPU register variables can be used, which do not require accessing memory and can be read/written directly. Loop control variables with a large number of loop iterations and variables repeatedly used in the loop body can be defined as register variables, and loop counting is the best choice for using register variables. Only local automatic variables and variables can be defined as register variables. Because register variables belong to dynamic storage, any variable that needs to be stored statically cannot be defined as a register variable. The specifier for register variables is register. Here is an example of using register variables: (4) Reduce or avoid time-consuming operations A large amount of runtime in an application is usually spent in critical program modules, which often contain loops or nested loops. Reducing time-consuming operations in loops can improve the execution speed of the program. Common time-consuming operations include: input/output operations, file access, graphical interface operations, and system calls. Among them, if file reading/writing cannot be avoided, then file access will be a major factor affecting the program's running speed. There are two ways to improve file access speed: one is to use memory-mapped files; the other is to use memory caching. (5) Optimize the use of switch statements When programming, sort the case values according to their probability, put the most likely case first, and the least likely case last, which can improve the execution speed of the switch statement block. (6) Optimize the loop body The loop body is the focus of program design and optimization. For some modules that do not need loop variables to participate in the calculation, they can be placed outside the loop. For loop bodies with a fixed number of iterations, the for loop is more efficient than the while loop, and the decrementing loop is faster than the incrementing loop. For example: In actual operation, two instructions need to be added outside the loop body for each loop: a subtraction instruction (to reduce the loop count) and a conditional branch instruction. These instructions are called "loop overhead". On the ARM processor, the subtraction instruction requires 1 cycle and the conditional branch instruction requires 3 cycles, so each loop adds an additional 4 cycles of overhead. The loop unrolling method can be used to improve the speed of loop operation, that is: repeat the loop subject multiple times and reduce the number of loops in the same proportion to reduce the overhead of the loop and increase the code size. In exchange for the speed of program operation. (7) Function calls Efficient function calls should limit the number of function parameters as much as possible, and not exceed 4. When calling on ARM, the number of parameters below 4 is passed through the register, and the number of parameters above 5 is passed through the memory stack. If there are more parameters to call, the relevant parameters can be organized into a structure and the structure pointer can be passed instead of the parameters. (8) Inline functions and inline assembly Important functions that have a great impact on performance can use the keyword _inline to inline, which will save the overhead of calling the function. The negative impact is that it increases the code size. The time-critical parts of the program can be written using inline assembly, which usually brings a significant speed improvement. (9) Use table lookup instead of calculation. Try not to perform very complex calculations in the program, such as the square root of floating-point numbers. For these time-consuming and resource-intensive calculations, the space-for-time method can be used. Calculate the function value in advance and place it in the program storage area. When the program runs, you can directly look up the table, which reduces the workload of repeated calculations during program execution. (10) Use hardware-optimized function libraries. Intel's GPP (Graphics Performance Primitives library)/IPP (Integrated Performance Primitives library) library designed for XScale processors has been manually optimized for some typical operations and algorithms in multimedia processing, graphics processing and numerical calculations. It can make good use of the computing potential of XScale hardware and achieve high execution efficiency. (11) Utilize hardware characteristics. In order to improve the running efficiency of the program, we should make full use of hardware characteristics to reduce its running overhead, such as reducing the number of interrupts and using DMA transfer mode. The CPU's access speed to various memory is ranked as follows: CPU internal RAM > external synchronous RAM > external asynchronous RAM > Flash/ROM. For program code already burned into Flash or ROM, if the CPU directly reads and executes the code, the execution speed is slow. Therefore, after system startup, the target code can be copied from Flash or ROM to RAM for execution, thus improving program speed. 4. Conclusion Performance optimization of embedded programs often conflicts with software development cycle, development cost, and software readability. A trade-off must be made, weighing the pros and cons. Algorithm and data structure optimization should be the primary optimization techniques; then, based on factors such as functionality, performance differences, and investment budget, select efficient compilers, system runtime libraries, and graphics libraries; use performance monitoring tools to detect program hotspots that consume the majority of runtime and optimize them using code optimization techniques; finally, use an efficient compiler for compilation optimization to obtain high-quality code.

A Brief Analysis of Optimization Issues in Embedded Programming

Read next

CATDOLL 146CM A-CUP/B-CUP Qiu (TPE Body with Hard Silicone Head)

CATDOLL 123CM Nanako TPE

CATDOLL 148CM Sana (TPE Body with Hard Silicone Head)

CATDOLL Sasha Hard Silicone Head