Design of an Embedded Real-Time Operating System Based on TS101

Introduction With the development of semiconductor manufacturing processes and the improvement of computer architecture, DSP processing chips are becoming increasingly powerful, controlling more and more peripheral devices, and their software algorithms are becoming increasingly complex. Software development for DSP systems not only involves dealing with complex software algorithms but also requires focusing most of the effort on various peripheral devices and related hardware control, making the entire process complex and arduous. Therefore, developing a compact, portable, and easily portable embedded real-time operating system for this type of system is of practical significance. This paper references the open-source real-time operating system uC/OS-II and completes the design and development of an embedded real-time operating system based on the ADSP-TigerSHARC101S (hereinafter referred to as TS101) series DSP chip from Analog Devices (ADI). 1. Introduction to the TS101 Chip The TS101 is a high-performance static superscalar processor manufactured by Analog Devices (ADI). The ADSP-TS101S core instruction cycle is 3.3ns, and each cycle can execute 4 instructions, 24 16-bit fixed-point operations, and 6 floating-point operations. Internally, the ADSP-TS101 features three independent 128-bit wide internal data buses, each connected to a 2MB internal memory array, providing 4 words of data, instruction, and I/O access, and an internal memory bandwidth of 14.4 Gbytes/s. Each of the dual arithmetic modules within the ADSP-TS101 contains an ALU, a multiplier, a 64-bit shifter, a 32-word register set, and an associated data alignment buffer (DAB). The dual integer ALUs (IALUs) on the chip each have their own 31-word register set for data addressing. In addition, the TS101 includes a program controller with an instruction alignment buffer (IAB), a branch target buffer (BTB), and an interrupt controller, along with three independent 128-bit wide internal data buses and 6MB of on-chip SRAM. The TS101 provides external ports for connection to the host processor, multiprocessor space (DSP), external SRAM, and SDRAM, a 14-channel DMA controller, four link ports, two 64-bit intermittent timers, and a timer full-time pin. An IEEE 1149.1 compatible JTAG interface on the chip can be used for on-chip emulation. 2. Embedded Real-Time Operating System Functionality Based on TS101 The embedded operating system based on TS101 described in this paper is designed with reference to the publicly available source code of the real-time operating system uC/OS-II. In fact, like uC/OS-II, it is merely a real-time kernel and does not have functional components such as a GUI or TCP/IP protocol stack. It supports preemptive multitasking scheduling and can provide effective services (such as semaphores, mailboxes, queues, delays, timeouts, etc.). Furthermore, concepts such as processes and threads from advanced operating systems can be introduced on top of uC/OS-II. Therefore, this design adopts a combination of processes and threads, treating tasks that perform different functions as processes, and then further subdividing each task into different threads. Inter-process scheduling and switching are performed in the TS101's internal memory area and external extended memory area (e.g., SDRAM), while inter-thread scheduling and switching are implemented in the TS101's internal memory area. From the perspective of the entire TS101 embedded real-time system in this design, its basic functions mainly include task management, interrupt management, and memory management. Among the three main functional areas of the TS101 embedded real-time system, the task management module is essentially a simple port of uC/OS-II. This paper mainly focuses on the design and description of the interrupt handling and memory management components. 3. TS101 Interrupt Handling TS101 interrupt handling includes two main categories: hardware interrupts and software interrupts. Software interrupts include software exception interrupts and debug interrupts. 3.1 Hardware Interrupts The TS101 processor does not require a dedicated stack pointer to save the context; the J and K registers of the IALU in the device can be used as the stack pointer. In interrupt service routines that allow nested interrupts, the interrupt return address (i.e., RETIB) value can be saved to the stack. After reading the return address from RETIB, the system will automatically enable global interrupts. However, global interrupts should ideally be disabled when saving related registers and RETIB; this is automatically done when the program controller writes the PC to RETIB. If the system does not support nested hardware interrupts, there is no need to save the processor state to the stack. Interrupts are generally executed according to the RETI register and return after the interrupt, without needing to process the global disable bit of the hardware interrupt. 3.2 Software Interrupt (Exception) Exception software interrupts are triggered during program execution. When exception interrupts are enabled, PMASK[62] can be set and the PC can be stored in RETS; for simulation exceptions, PMASK[63] can be set and the PC can be stored in DBUG. When an exception interrupt occurs, the program controller will take the address pointed to by the IVSW register, and for simulation exceptions, it will take the address from the EMUIR register, and at the same time clear the instructions in the instruction pipeline. 3.3 Interrupt Return Interrupt return is achieved by executing the RTI instruction in the interrupt service routine. Of course, this requires that the return address be stored in the register when responding to the interrupt service routine. Usually, it is required that the return address be placed in the RETIB register at least 8 instruction cycles before the execution of the instruction, so that the branch buffer BTB can be used. 4 Implementation of Operating System Interrupt Handling In this operating system, the schematic code and function of the user interrupt service routine are listed in Table 1. In fact, the user should first push the processor's registers onto the current stack [Program Listing 1, (1)]. When performing interrupt handling, the operating system needs to know that the user is performing an interrupt service routine, so the user should call OS_Int_Enter_C() to directly increment the global variable OSIntNesting [Program Listing 1, (2)]. After completing the above two steps, the user can start servicing the device that triggered the interrupt [Program Listing 1, (3)]. Since the operating system allows interrupt nesting, and the system can track the nesting level OS_IntNesting. However, to allow interrupt nesting, in most cases, the user should clear the interrupt source before enabling interrupts. Calling the interrupt exit function OS_Int_Exit_C() [Program Listing 1, (4)] marks the end of the interrupt service routine, and OSIntExit() will decrement the interrupt nesting level counter by 1. When the nesting counter reaches zero, all interrupts, including nested interrupts, are completed. At this time, the operating system needs to determine whether any higher priority tasks have been woken up by the interrupt service routine (or any nested interrupt). If a high-priority task enters the ready state, the system returns to that high-priority task, and OS_Int_Exit_C() returns to the calling point [(5) in Program Listing 1]. The saved register values will be restored at this time, and then the interrupt return instruction will be executed [(6) in Program Listing 1]. It should be noted that if scheduling is disabled (OSIntNesting>0), the system will return to the interrupted task. A detailed explanation of the above description is shown in Figure 1. In the figure, sometimes an interrupt arrives [Figure 1(1)], but cannot be recognized by the processor. This may be because the interrupt is disabled by the operating system or user application, or because the processor has not yet finished executing the current instruction. Once the processor responds to the interrupt [Figure 1(2)], the processor's interrupt vector will jump to the interrupt service subroutine [Figure 1(3)]. Once the interrupt service routine is saved in the processor register (also called the CPU context) [Figure 1(4)], the user interrupt service routine will notify the operating system to enter the interrupt service routine by calling OS_Int_Enter_C() to increment OS-IntNesting by 1 [Figure 1(5)]. Then the user interrupt service code begins execution [Figure 1(6)]. It should be noted that the user interrupt service should do as little as possible, leaving most of the work to the task. After the user interrupt service is completed, OS_Int_Exit_C() should be called [Figure 1(7)]. As can be seen from the timing diagram, for the interrupted task, if no high-priority task is activated by the interrupt service routine and enters the ready state, OS_Int_Exit_C() only takes a very short time to run. In this case, the CPU register is simply restored [Figure 1(8)] and the interrupt return instruction is executed [Figure 1(9)]. If an interrupt service routine puts a high-priority task into the ready state, OS_Int_Exit_C() will take a long time to run because a task switch will be performed [Figure 1(10)]. The register contents of the new task will be restored and the interrupt return instruction will be executed [Figure 1(12)]. 5 Memory Management In the C environment of TS101, memory can be divided into code, data, heap, and stack. The code area is used to store user code, the data area is used to store global and static variables, the stack area is used to store temporary variables, and the heap area is used to provide dynamic memory allocation for users. The size of each memory partition can be manually divided in the compiler's linker description file (Linker DescriptonFile.LDF). The library functions provided by TS101 already include relatively complete memory management functions (such as commonly used functions such as calloc, free, malloc, and realloc), which can be used for basic memory management. In designing an operating system, the focus should be on memory expansion to effectively utilize external storage (such as SDRAM). To achieve memory expansion, this design treats a large task as a process, and then divides that process into smaller threads. Multiple processes can be stored in the system's external storage, but only one is loaded into memory at a time for execution. The operating system primarily manages the multiple threads within this process. This allows multiple processes to run within a single system, with the program controlling the switching between them from main memory to external storage or vice versa, although this switching process incurs some time overhead. 5.1 Use of External Storage Heap Area The TS101 provides a heap area for users, with functions such as calloc, malloc, realloc, and free for its management and use. Users can also manually modify the linker description file to obtain a relatively large heap area. However, by default, users can only use the heap area provided in the linker description file, which is far from sufficient. If users could dynamically allocate and use storage space on external storage as they would in main memory, it would bring great convenience. Fortunately, the TS101 compilation system provides this functionality. Users only need to modify the assembly file and linker description file named "ts_hdr.asm" provided by the system, compile the "ts_hdr.asm" file, and then replace "ts_hdr_TS101" in the linker description file with the generated "ts_hdr.doj" file. The following is a brief introduction to the file modification process. Below is a description of the default heap region in the TS101 linker description file, defining its base address and size. The default heap region can be manipulated in the ts_hdr.asm assembly file. Its main function is to assign an ID number of 0 to the default heap region for easy use when a new heap region exists. The code in ts_hdr.asm for numbering the default heap region includes: * var=1df_defheap_base; * var=1df_defheap_size; * var=0; When modifying the linker description file and assembly file, simply describe the new heap region in the linker description file and number it in the assembly file. The code is as follows: The code describing the new stack area can be allocated in external memory (SDRAM). It is numbered 1 in the new stack. After modifying the file according to the above steps, the user can dynamically use memory in the external storage area. The compiler also provides a series of library functions for dynamic memory allocation. Specifically, the program dynamically allocates memory of size 50 in the default heap area of internal memory [6(1)], and dynamically allocates memory of size 256 in the external storage area [6(2)]. The program is as follows: int *x, *y; x=heap_malloc (0,50); (1) y=heap_malloc (1,256); (2) 5.2 Memory Overlay The TS101 allows a large amount of program code to be stored in external memory. A small amount of code is then read into memory for execution each time via DMA transfer, thus expanding memory space and saving time compared to storing all the code in external memory. This method is called memory overlay. Memory overlay is a many-to-one memory mapping technique that allows multiple code segments to be stored in different locations in external memory, but also to run in the same location in memory. The storage area of the code in external memory is called the "live" area, and the execution area in memory is called the "run" area. Figure 2 shows the usage structure of the overlay. As can be seen from the figure, overlay1 and overlay2 can run in the same area of memory, while overlay3 and overlay4 can also run in the same area of memory. When the main function calls FUNC_B, overlay2 will be swapped into memory for execution, and when the main function calls FUNC_A, overlay1 will replace overlay2. The usage of overlay3 and overlay4 is the same as overlay1 and overlay2. The swapping of code between memory and external memory is mainly achieved through DMA transfer. The memory overlay manager is a user-written subroutine used to load functions or data into memory. It works in conjunction with the linker's `PLIT{}` instructions to perform memory overlay operations. Besides handling loading operations from external storage to memory, the memory overlay manager is also responsible for establishing the stack, saving register values, checking if the function to be called is already in memory, and using DMA operations to perform memory overlay loading while other functions are executing. Assisted operations for memory overlay can be performed through the linker description file. This operation requires defining two overlay code blocks, `OVLY_one` and `OVLY_two`. `OVLY_one` contains the function `FUNC_A.doj`, while `OVLY_two` contains `FUNC_B.doj` and `FUNC_C.doj`, which run in the same memory region of the `MOCode`. The code for overlay operations in the linker description file is as follows: In the linker description file, the PLIT{} instruction can also be defined to assist in the completion of memory overlay operations. When the main function calls a function in a memory overlay area, the linker will reboot the function call and perform the operation. For example, when the main function calls the memory overlay function FUNC_A, the linker will automatically convert it into a call to .plt_FUNC_A. This operation performs the PLIT operation before the function is executed and jumps to the overlay manager for execution, and then executes function A. The following is the definition code for the PLIT operation: 5.3 Implementation of Memory Expansion The operating system provides the OS_Process_Sched() function to complete this operation. The process switching function sequence code in this function is as follows: (1) Push the register values of the process in memory onto the stack; (2) Move all the contents of the memory area of the process in memory to the external storage using DMA; (3) Call the register pop function of the process in the external storage; (4) Jump to the new process to run. In process scheduling, all register values are first pushed onto the current process stack for preservation [sequential code (1)]. Then, all contents in the process memory storage area are placed into external memory for preservation, so that when the process re-enters memory, the original running environment can be fully restored [sequential code (2)]. All contents here refer to data related to the current process, including the process stack, process global variables, dynamically allocated memory blocks, etc. Calling the process register pop function in external memory [sequential code (3)] mainly utilizes the memory overlay technique mentioned above. The pop function of this register is generally placed in external memory. During operation, it can be called to make the linker jump to the memory overlay manager to complete the loading of the process from external memory into memory. However, here the memory overlay manager needs to be modified, and a function to load all the data of the process in external memory needs to be added. Then, the program can jump to the new process to start running [sequential code (4)]. 6 Conclusion Based on the research of embedded real-time operating systems, this paper completes the design of an embedded real-time operating system based on the TS101DSP chip. The system architecture designed mainly refers to the open-source real-time operating system uC/OS_II, and on this basis, innovations and redesigns were carried out according to the characteristics of the chip itself and the needs of actual applications. This is mainly reflected in the following three aspects: (1) The interrupt handling part was designed in combination with the characteristics of the TS101 chip; (2) The method of allocating a block of memory for each task in general operating systems was abandoned. Instead, memory was managed in blocks and all tasks shared the same block of memory. The same memory was managed in a unified manner; (3) Memory overlay technology was studied and implemented according to the actual system needs, which expanded the system's storage space. Of course, the design of any embedded operating system has a process from simple to detailed and needs to be improved step by step. This paper only completes the implementation of the basic functions of the TS101 embedded real-time operating system. It is believed that after long-term actual simulation operation, a more complete and more stable and reliable embedded real-time operating system will be established.

Design of an Embedded Real-Time Operating System Based on TS101

Read next

CATDOLL 123CM LuisaTPE

CATDOLL 123CM Ava (TPE Body with Hard Silicone Head)

CATDOLL 135CM Laura

CATDOLL 128CM Katya Silicone Doll