Real-time performance comparison and evaluation of embedded operating systems

Abstract: This paper focuses on a series of indicators affecting the real-time performance of embedded operating systems. Based on a comparative experimental platform, a novel CPLD-based indicator evaluation method is proposed. WinCE and Linux operating systems are used as test objects for comparative testing and evaluation. Finally, the test results are analyzed in conjunction with the architectural and scheduling strategies of the two operating systems. Keywords: RTOS, Comparison Evaluation, Real-time Performance, Comparison Experimental Platform Introduction Embedded Real-Time Operating Systems (RTOS) provide a system-level support environment for embedded application developers, greatly simplifying the design process of embedded software systems and becoming a very important branch of operating systems. With the widespread application of RTOS in embedded systems, the selection and evaluation of RTOS has become an important issue. The evaluation of an RTOS should be conducted from many perspectives, such as architecture, API richness, network support, and reliability. Among these, real-time performance is one of the most important indicators for RTOS evaluation, and its quality is an important reference for users when choosing an operating system. This paper focuses on discussing which indicators should be emphasized when evaluating the real-time performance of an operating system and how to test them. 1. Main Indicators of Operating System Real-Time Performance Strictly speaking, there are many factors that affect the real-time performance of an embedded operating system. Due to space limitations, this article only lists six main factors that affect the real-time performance of the operating system. (1) Average running time of common system calls, i.e., system call efficiency, refers to the average time required for the kernel to execute common system calls. The POSIX standard can be referenced, and some common system calls can be selected for testing according to process, thread, synchronization primitives (semaphores and mutexes, etc.), file, memory, interrupt handling, clock, and time. These include creating/deleting processes and threads, creating/deleting files, reading/writing files, setting/getting priorities, creating/releasing semaphores, allocating/releasing memory space, loading/unloading interrupt handling modules, etc. The selected samples may not be complete. This is only proposed as a method for reference. (2) Task switching time Task switching time refers to the time interval from when the current task stops running and saves its running state (CPU register contents) after an event triggers a switch, to when the state of the next task to be run is loaded and it starts running, as shown in Figure 1. It should be noted that a certain event trigger is required for the task to switch. Normally, this event is a synchronization primitive that enables task switching and the process can be monitored. However, the operation of synchronization primitives will bring certain system overhead, and the efficiency of various synchronization primitives in different operating systems is different. Therefore, task switching tests are performed on the operating system under test using various synchronization primitives it supports, and the one with the shortest time is selected as the measurement value to minimize the error. After testing Mutex, Semaphore, Critical Section, SVR5 Semaphore, POSIX Semaphore, and pthread_mutex, the best primitive for WinCE is Critical Section, while the best primitive for Linux is pthread_mutex. (3) Thread switching time Threads are the smallest units that can be scheduled. In the application system of embedded systems, many functions are executed in the form of threads, so thread switching time is also a key point to be examined. The test method and principle are similar to those for task switching, and will not be introduced again. (4) Task preemption time Task preemption time is the time consumed by a high-priority task to obtain system control from a running low-priority task, as shown in Figure 2. (5) Semaphore shuffling time: Semaphore shuffling time refers to the time delay from when one task releases a semaphore to when another task waiting for the semaphore is activated, as shown in Figure 3. In embedded systems, many tasks often compete for a shared resource simultaneously. Semaphore-based mutual exclusion ensures that only one task can access the shared resource at any given time. Semaphore shuffling time reflects the time overhead related to mutual exclusion and is an important indicator of RTOS real-time performance. (6) Interrupt response time: Interrupt response time refers to the time from the occurrence of an interrupt to the start of execution of the user's interrupt service routine code to handle the interrupt. Interrupt handling time is usually determined not only by the RTOS but also by the user's interrupt handler, so it should not be included in the test framework. For some or all of these indicators, there are already a considerable number of test methods and test programs, such as the Rhealstone method and a large number of benchmarks (lmbench, HbenchOS, etc.). However, these test methods and programs either have insufficient timing accuracy due to inadequate timing methods or require too much professional hardware equipment (such as logic analyzers, oscilloscopes, etc.), making the test requirements too high and the test conditions difficult to achieve, and all have certain defects. To address these issues, this paper proposes a testing method based on the combination of CPLD and the target system, which effectively solves these problems. 2. Comparison Platform and Testing Method 2.1 Introduction to the Comparison Testing Platform To better evaluate the software systems (including operating systems, bootloaders, user applications, and other system programs) at various levels of embedded systems, we designed and implemented a dual embedded system comparison experimental platform. The experimental platform is based on two Advantech PCM7230 development boards (based on the PXA255 processor) and one CPLD device. The operating system under test runs on the development boards, ensuring a completely identical testing environment. The CPLD device is responsible for generating interrupt loads, synchronous set/reset triggering of the two systems, and timing functions, ensuring the accuracy of the test results and facilitating comparison and observation, highlighting the comparison characteristics of the evaluation process. Figure 4 shows the logical structure of the comparison testing platform. The following lists the main hardware models and types in the comparison platform: ◇ CPU: XScale (400 Hz). ◇ Clock: HT1381. ◇ ROM: 1 MB AMD. ◇ SDRAM: 64 MB. ◇ Flash: 32 MB. ◇ I/O Resources: Includes RS232 (COM1~4), RS485 (COM5), 2 USB Host and 1 USB Client, Ethernet DM9000.10/100 basedT, and AMI120 expansion bus interface. 2.2 Testing and Timing Methods During the testing process, the currently popular benchmark testing method was used to evaluate the above real-time performance indicators. A corresponding test program was written for each indicator. A fundamental principle during the testing process was to minimize measurement errors as much as possible. Multiple strategies were employed to reduce the impact of other factors on the test, such as closing unnecessary processes in the kernel to shorten the kernel's CPU time; disabling the data cache and instruction cache to avoid the impact of cache on the corresponding RTOS indicators; and performing high-frequency repeated tests on the same indicator, statistically analyzing its maximum, minimum, and average values to obtain the most objective results possible. Compared to conventional benchmarking methods, this testing method combines a CPLD device with a test program. Utilizing the abundant pin resources of the CPLD and the development board, programming the CPLD allows for easy generation of interrupt loads and synchronous triggering of the system under test without adding extra load. Simultaneously, it reduces the number of system calls, resulting in more accurate test results that are closer to the kernel's own operating values. Furthermore, the timing function during the test is implemented through CPLD programming. Compared to the traditional method of using RTOS kernel time system calls for timing, this solves the problems of insufficient accuracy and inconsistent units in system call return values across different operating systems. Since the CPLD device used in the comparison platform is the Xilinx XC9500 series, with a maximum system clock frequency of 100 MHz and a maximum pin-to-pin delay of 10 ns, the implemented counter timing accuracy can reach tens of ns, which is almost negligible, greatly improving timing precision, as shown in Figure 5. The entire testing process is mainly divided into four parts: preparation, kernel test program programming, CPLD programming, and implementation of the interaction with external systems. Preparation work includes compiling the kernel and modifying the bootloader. The bootloader is modified from ibootlite 1.8 to make it applicable to the comparison platform. The kernel test program is divided into six modules according to the six indicators mentioned earlier, and each module is written separately. CPLD programming mainly includes timing programs and interrupt loaders. The external interaction part mainly includes serial communication and Ethernet card drivers. Below is some VHDL program source code on the CPLD. Here, fenpin is the clock frequency, and flagreci is the received signal; when using manual control via buttons, flagsend and flagstop are for starting and ending the timing. process (fenpin, flagsend, flagreci, flagstop) begin flag<=flagsend & flagreci & flagstop; if (fenpin′ event and fenpin='1') then if (flag="0010000") then if (tempsendout="0000000000000111") then tempsendout<=tempsendout; else tempsendout<=tempsendout+'1'; end if; countout<=countout+'1'; if (tempsendout="0000000000000111") then outsend<='0'; outsendled<='0'; iscounting<='1'; else outsend<='1'; outsendled<='1'; end if; else iscounting<='0'; signdisp<='1'; end if; end if; end process; The test program contains a lot of code, which is not listed here. Only a snippet of the code embedded in the program that interacts with the CPLD is provided for reference. #define base_add (*(volatile unsigned *)0x40E00000) #define gpio3_derect (*(volatile unsigned *)(0x40E00000+0x0C)) #define gpio3_out1 (*(volatile unsigned *)(0x40E00000+0x18)) int to_CPLD(void) { gpio3_derect = 0x8;//Set the pin to output gpio3_out1 = 0x8;//Output a high level} The first part of the code defines the addresses of the relevant registers. During the test, the GPIO3 pin of the PXA255 is used to interact with the CPLD to implement the timing function. Because it needs to run in kernel mode, this function is compiled into the kernel as a module. In the test program, this code is executed via the ioctl system call to send a signal to the CPLD. The CPLD calculates the interval between the two signal calls, thus implementing the timing function. 3. Test Results and Analysis of Linux and WinCE Based on the above indicator system and testing methods, we conducted relevant tests on Linux and WinCE. The Linux version was 2.4.19, and the WinCE version was WinCE.Net. Since the performance indicators of the same kernel can differ depending on the hardware platform and operating environment, the evaluation of different RTOSs is only valuable when compared under the same platform environment. The test aims at evaluation, and the evaluation is based on comparison. Table 1 shows the evaluation results for the two kernels. Due to space limitations, only the average time is listed here; the maximum and minimum values are not listed. Table 1 compares and evaluates the performance metrics of Linux and WinCE. As shown in Table 1, Linux 2.4.19 and WinCE.Net are similar in terms of task switching time, thread switching time, and average system call runtime. However, WinCE.Net significantly outperforms Linux 2.4.19 in terms of task preemption time, semaphore shuffling time, and interrupt response time. In general, WinCE.Net's real-time performance is superior to Linux 2.4.19. The following explains the test results from the perspectives of the characteristics and internal implementation mechanisms of the two operating systems. Both Linux and WinCE allow different processes to have the same priority, unlike real-time kernels such as μC/OS (where each task has a unique priority). Therefore, they employ a hybrid scheduling strategy of preemption and round-robin scheduling. Consequently, when switching between processes of the same priority, the metrics are similar. The Linux 2.4.19 used in the test is not a dedicated operating system designed for embedded real-time systems; it is merely a modified version of the existing general-purpose kernel. While WinCE.Net is not a strictly real-time kernel, it is specifically designed for embedded systems. Therefore, WinCE is significantly stronger than Linux in terms of task preemption and interrupt response. Additionally, Linux 2.4.19 does not support preemption at the kernel level, which is a major reason why its preemption time is longer than WinCE's. However, this issue has been resolved in Linux kernel version 2.6. In terms of system call efficiency, WinCE.Net is superior to Linux 2.4.19, but Linux system calls are more POSIX compliant, more standardized, and more open. In conclusion, WinCE.Net is more suitable than Linux 2.4.19 for embedded systems with high real-time requirements, and WinCE.Net inherits Microsoft's consistent advantages in developing desktop systems, making development easier. However, if the system's real-time requirements are not high, Linux may be a more suitable choice because it can reduce costs, is completely transparent to users, and is easy to modify and customize. If you want to use Linux as the operating system to develop systems with high real-time requirements, you should make appropriate real-time modifications to it, or directly use a Linux kernel that has been modified for real-time performance, such as RTLinux. 4. Summary and Outlook Compared to traditional pure software testing methods, the testing methods introduced in this paper are characterized by high accuracy and ease of comparison, without significantly increasing testing complexity. Compared to purely hardware testing methods, they offer advantages such as high cost-effectiveness, fewer equipment requirements, and strong scalability, with minimal difference in testing accuracy. However, their functionality is not as powerful as dedicated hardware devices such as logic analyzers and oscilloscopes. The real-time performance index system for embedded operating systems introduced in this paper still has considerable room for improvement and expansion; each index can be further refined. Further testing using the methods presented in this paper under different load conditions can lead to more comprehensive and objective test results.

Real-time performance comparison and evaluation of embedded operating systems

Read next

CATDOLL 135CM Vivian (Customer Photos)

CATDOLL 135CM Ya

Questions about the principle of thyristor phase-shifting soft starter

CATDOLL Laura Soft Silicone Head