Abstract: Various embedded processors and SoCs are used in various devices such as sensors, mobile phones, and PDAs. Each device has its own requirements for execution performance, size, cost, power consumption, heat dissipation, etc. Among them, power consumption and heat dissipation are particularly important. Based on the optimization of software in embedded systems, this paper proposes an algorithm for reordering instructions to reduce system power consumption. Introduction Since the birth of the world's first microprocessor designed for embedded applications, Intel 4004, in the 1970s, embedded systems have been developing for more than 30 years. In recent years, embedded systems have become one of the most dynamic branches in the electronic information industry. With the large-scale application of emerging products such as mobile phones, PDAs, GPS, and set-top boxes, the market for embedded systems is increasing at a rate of 30% per year, and the design of embedded systems has become an increasingly important topic for software and hardware engineers. Embedded systems are application-centric, computer technology-based, and software and hardware can be customized. They are dedicated computer systems that are suitable for application systems with strict requirements for functionality, reliability, cost, size, power consumption, etc. [1]. In the design of embedded systems, low-power design is a problem that must be faced. The reason for this is that embedded systems are widely used in portable and highly mobile products, which often rely on batteries rather than a constant power supply. Therefore, every detail must be considered to reduce power consumption and extend battery life as much as possible. In fact, considering low-power design holistically has become an increasingly urgent issue. Low power consumption is a crucial characteristic that portable electronic devices must possess. Research in the past few years has mainly focused on hardware, but now the focus is more on reducing system power consumption through software optimization. To optimize software, it is necessary to understand the power consumption of each instruction and choose the correct compilation method to reduce the power consumption of program execution. Because various microprocessor architectures differ, instruction sets and power consumption also vary. Therefore, optimization methods suitable for one processor may not be suitable for other processors. Thus, choosing a microprocessor that matches the power-saving software is crucial. 1. Compiler Optimization The compiler's role is to translate programs written in high-level languages, such as C/C++, into programs that can be executed on the target machine. In other words, compilers provide an abstraction layer for high-level language programmers, enabling them to easily solve practical problems by writing high-level language code similar to the actual problems (without using assembly or machine language); at the same time, it ensures the readability and maintainability of the program, improving the efficiency of software development. Furthermore, porting the program to a new target machine only requires recompiling the program with the appropriate compiler, without having to rewrite the program. However, in some cases, this comes at the cost of program execution performance. The effectiveness of a compiler and the efficiency of the code it generates can be compared with the code written by expert-level assembly/machine language programmers; therefore, more efficient code can be generated through compiler optimization. Optimizing the compiler can effectively reduce the power consumption of embedded devices. In a program, each instruction activates certain hardware components in the microprocessor; therefore, correctly selecting instructions can reduce processor power consumption. By establishing power consumption information for the instruction set under a specific processor architecture, and using methods such as "instruction reordering to reduce jumps," effective low-power software optimization can be performed. Two assumptions are made here: ① Each instruction has a fixed power consumption; ② The heat dissipation of each instruction is independent of its operands and other instructions. As can be seen from Figure 1, by reordering the instructions, the initial power status of a program, as shown in Figure 1(a), can be transformed into the status shown in Figure 1(b). It can be concluded that although the local heat dissipation status is different in the two cases, the total power consumption is the same. In other words, some adjustments can be made to the local heat dissipation status of the program without affecting the total power consumption to meet the actual needs. The power consumption of the system is reduced by reordering the instructions [2]. [align=center][img=303,142]http://www.e-works.net.cn/images/127973501637812500.GIF[/img] Figure 1 Two possibilities of power in local area within a program[/align] 2. Instruction sorting We know that the power of a processor running a specific program is P = I × Vdd (I is the average current, Vdd is the given voltage), then the power consumption of the program is E = P × t (t is the execution time of the program); at the same time, t = N × T (T is the instruction cycle, which is the reciprocal of the main frequency, N is the number of cycles of the program execution). In embedded systems, especially in mobile devices, they are generally powered by batteries, so the power consumption of the system is a very important indicator. Now, Vdd and T are known quantities, so the electrical energy E consumed by the program is proportional to the product of the current I and the number of program cycles N. Here, we use the model established in reference [3] to illustrate this. In this model, the current I required to execute each instruction is measured and estimated by using devices such as an oscilloscope [4]. In summary, the characteristics of multiple data storage areas in embedded processors can be utilized to achieve parallel data processing. By sorting instructions, the execution cycle of instructions can be reduced, thereby reducing power consumption. 2.1 Example Assume there is a C language program, as shown in Figure 2(a). Figure 2(b) is its corresponding assembly code, and Figure 2(c) shows the data dependency graph (DDG) with two weights for each node. The first weight indicates the depth of the node in the DDG, such as the first weight of V10 being 1 and the first weight of V0 being 6. Assume that the larger this weight is, the higher its priority, as shown in Figure 2(c) where V0 and V1 have the highest priority. [align=center][img=375,265]http://www.e-works.net.cn/images/127973502079531250.GIF[/img] Figure 2 C language code, assembly code and data dependency graph[/align] Figure 3 shows the execution order of instructions before using the algorithm in this paper. Note that the bold text in the diagram, namely V2, V6, and V9, is different from the other instructions. These are ADD or MPY instructions, which require the use of the system's ALU. ALU operations and MOVE operations can be executed simultaneously within the same instruction cycle, but two ALU operations cannot be executed concurrently. [align=center][img=177,148]http://www.e-works.net.cn/images/127973502751562500.GIF[/img] Figure 3: Execution order of nodes before instruction sorting[/align] The second weight of a node indicates the lifetime of the associated register. As shown in Figure 4, the register that V0 depends on is r0, and its lifetime is 1 to 3, which is 2. From the diagram, we can conclude that this program requires a total of 11 instruction cycles and uses at least 2 registers simultaneously. [align=center][img=357,237]http://www.e-works.net.cn/images/127973502926562500.GIF[/img] Figure 4: The state before instruction reordering[/align] Figure 5 shows the state after reordering the instructions based on the algorithm in this paper. The total execution cycle of the program becomes 6, but the number of registers used increases to 3. It can also be seen that there is a trade-off between the execution cycle of the program and the number of registers. [align=center][img=347,380]http://www.e-works.net.cn/images/127973503075625000.GIF[/img] Figure 5: The state after sorting algorithm[/align] The model established in reference [3] is used in this paper to calculate the power consumption of the program. In Figure 5, the total current required for program execution is I = 780 mA, and the total number of execution cycles is N = 6. Therefore, the circuit consumption is E = N × I = 6 × 780 mA = 4680 mA. Without using any algorithm, as shown in Figure 2, E = N × I = 1080 × 11 = 11880 mA. By using the algorithm in this paper, the program execution cycle is reduced, and the power consumption of the program is also reduced. That is to say, by using the algorithm in this paper, the execution performance of the program is improved, and the power consumption of the system is optimized to the greatest extent. It can be seen that at this level, the choice of algorithm is very important. 2.2 Algorithm Description The algorithm in this paper is based on the sorting mechanism based on the serial array proposed in reference [5]. It is mainly aimed at reducing the execution cycle of the program, while taking into account the use of as few registers as possible. The program is described as follows: ① Construct the data dependency graph DDG. ② Construct weighted tuples, where the first weight is the depth of the node in DDG, set as P; the second weight is the lifetime, set as L. ③ Search the ready table R (as shown in Figure 3). ④ while the ready list R is not empty do the value of P as the highest priority of the highest node if the depth of the node in the current instruction cycle 3. Conclusion In recent years, power consumption has become an increasingly important issue in embedded applications. This is especially true in mobile devices, where battery power makes power consumption particularly critical. Current compilers rarely fully utilize the various features of the processor, resulting in code that cannot compare to code written by expert assembly programmers. This paper proposes an optimization compiler algorithm that reorders instructions from a software perspective to reduce system power consumption. Future work will focus on selecting and researching specific microprocessors, creating tools to generate instruction set power consumption information for these microprocessors, and further applying the algorithm to optimize compilation, ultimately achieving power consumption optimization. References [1] Wayne Wolf. Embedded computing system design principles. Translated by Sun Yufang et al. Beijing: Machinery Industry Press, 2002 [2] Sathishkumar Udayanarayanan. Energy-efficient code generation for DSP56000 family, MS. Thesis in Arizona State University (Aug. 2000) [3] Gibbons PA, Muchnick S S. Efficient Instruction Scheduling for a Pipelined Processor, in Proc. of the SIGPLAN Symposium on Compiler Construction (July 1986), pp. 11-16 [4] Ulrich Kremer. Low Power/Energy Compiler Optimizations [5] Wen Tsong Shiue. Retargetable Compilation for Low Power Wang Lisheng, Master's Supervisor. Xia Zhijiang, Master: Main research direction is embedded systems and their applications.