1. Requirements for Memory Allocation in Embedded Systems: ① Speed. The real-time requirements of embedded systems necessitate that memory allocation be as fast as possible. Therefore, it's impossible to use the complex and sophisticated memory allocation strategies found in general operating systems; simple and fast allocation schemes are typically employed. Of course, different programs with different real-time requirements will use different allocation schemes. For example, VxWorks uses a simple first-match method such as immediate aggregation; VRTX uses multiple fixed-size binning schemes. ② Reliability. Memory allocation requests must be satisfied; allocation failure could have catastrophic consequences. Embedded systems operate in diverse environments, some with extremely high reliability requirements. For instance, in an autonomous driving system, if the system detects an impending collision and cannot respond due to memory allocation failure, a fatal accident could occur—this is unacceptable. ③ Efficiency. Memory allocation should minimize waste. It's impossible to configure memory to an infinitely large size just to satisfy all memory allocation requests. On the one hand, the cost requirements of embedded systems mean that memory is a very limited resource; on the other hand, even without considering cost, the limited space and board area of the system determine that the configurable memory capacity is very limited. 2. Static Allocation vs. Dynamic Allocation Whether to use static or dynamic allocation has always been a hotly debated topic in embedded system design. Of course, the most suitable answer is to adopt different solutions for different systems. If the system has extremely high requirements for real-time performance and reliability (hard real-time system), and cannot tolerate even a slight delay or a single allocation failure, then a static allocation scheme is necessary, meaning that the memory required is allocated during program compilation. For example, the embedded system on a Mars rover must use a static allocation scheme. Furthermore, OSEKWorks, a real-time operating system from Wind River specifically designed for automotive electronics and industrial automation, does not support dynamic memory allocation. In such applications, cost is not a priority; real-time performance and reliability are paramount. Of course, an unavoidable drawback of static memory allocation is the loss of system flexibility. The required memory must be known and allocated in advance during the design phase; all possible scenarios must be considered beforehand, because the system cannot handle unforeseen situations. Such an allocation scheme inevitably leads to significant waste, as memory allocation must be configured to the maximum extent possible under worst-case conditions, while only a small portion may actually be used during operation. Furthermore, without changing the hardware platform, it is impossible to flexibly add functionality to the system, making system upgrades difficult. Most systems are a combination of hard real-time and soft real-time systems. That is, some tasks in the system have strict time limits, while others only require completion as quickly as possible. According to RMS (Rate Monotony in Scheduling) theory, such systems must employ preemptive task scheduling; however, dynamic memory allocation can be used to satisfy the less reliable and real-time tasks. The advantage of dynamic memory allocation is that it gives designers great flexibility, making it easy to port programs originally running on non-embedded operating systems to embedded systems, such as the network protocol stacks used in many embedded systems. If static memory allocation were used, porting such protocol stacks would be much more difficult. Furthermore, dynamic memory allocation allows designers to flexibly adjust system functionality and make trade-offs between various functions without changing the basic hardware platform. For example, adjustments can be made between the number of supported VLANs and the number of supported routing entries, or different versions can support different protocols. Ultimately, dynamic memory allocation gives embedded system programmers fewer restrictions and greater freedom. Therefore, most real-time operating systems provide dynamic memory allocation interfaces, such as the malloc and free functions. 3 Memory Allocation Interfaces Provided by RTOS Different RTOSs employ different memory allocation strategies due to their different positioning. For example, VRTX uses a memory allocation scheme similar to the Binning algorithm developed by DougLea in GNUC, where system memory is divided into a set of fixed-size memory blocks. The advantage of this method is fast lookup speed and no memory fragmentation. However, its disadvantages are also obvious: it is prone to waste because the size of memory blocks is finite. During allocation, only larger memory blocks can be used to meet smaller needs, resulting in significant accumulated waste; moreover, managing such a memory allocation table is a heavy burden for the operating system. The following section details the memory allocation strategy employed in VxWorks, a commonly used RTOS developed by Wind River. VxWorks' predecessor was VRTX, and its name is said to derive from makevrtxwork. VxWorks' memory management functions reside in two libraries: memPartLib (a compact memory partition manager) and memLib (a complete memory partition manager). The former (memPartLib) provides tools for allocating memory blocks from memory partitions. This library contains two types of programs: general-purpose tools that create and manage memory partitions and allocate and manage memory blocks from them; and standard malloc/free programs that provide an interface to memory partitions. The system memory partition (whose ID is memSysPartId, a global variable) is created by usrRoot calling memInit during kernel initialization. Its starting address is immediately after the VxWorks BSS segment in RAM, and its size is all free memory, as shown in Figure 1. When creating other partitions, it is generally necessary to first call malloc to allocate a segment of memory from the system memory partition before creation. The memory partition is defined as mem_part, which contains an object marker, a doubly linked list to manage free blocks, a semaphore to protect the partition, and some statistical information, such as total size, maximum block size, debugging options, number of allocated blocks, and allocated size. The statement is as follows: typedef struct mem_part {OBJ_COREobjCore; /* Object flags */ DL-LISTfreeList; /* Free list */ SEMAPHOREsem; /* Semaphore protecting the partition */ UnsignedtotalWords; /* Number of words in the partition */ UnsignedminBlockWords; /* Minimum block size in words */ Unsignedoptions; /* Options for debugging or statistics */ /* Allocation statistics */ unsignedcurBlocksAllocated; /* Number of currently allocated blocks */ unsignedcurWorkedAllocated; /* Number of currently allocated words */ unsignedcumBlockAllocated; /* Cumulative number of allocated blocks */ unsignedcumWordsAllocated; /* Cumulative number of allocated words */ } PARTITION; Generally, there is only one memory partition in the system, namely the system partition. All memory required by tasks is directly allocated from it by calling malloc. Allocation uses the First-Fit algorithm (note that this algorithm is prone to causing a lot of fragmentation), and memory freed by `free` is aggregated to form larger free blocks. This is the memory allocation mechanism of VxWorks. Allocation can specify a certain alignment format. Note that different CPU architectures have different alignment requirements. To optimize performance, the pointer returned by `malloc` is aligned, and the overhead for this varies depending on the architecture. For example, 68K is 4-byte aligned, with an overhead of 8 bytes; SPARC is 8-byte aligned, with an overhead of 12 bytes; MIPS is 16-byte aligned, with an overhead of 12 bytes; and I960 is 16-byte aligned, with an overhead of 16 bytes. The MemLib library provides enhanced memory partitioning management tools, adds some interfaces, and allows setting debugging options. It can detect two types of errors: ① attempting to allocate too much memory; ② finding bad blocks when freeing memory. There are four error handling options: logging a message or suspending the task when an error occurs. However, when using dynamic memory allocation `malloc`/`free`, the following limitations should be noted. ① Because system memory partitions are critical resources protected by semaphores, using malloc will cause the current call to be suspended, therefore it cannot be used for interrupt service routines; ② Because memory allocation requires the execution of a lookup algorithm, the execution time of which is related to the current memory usage of the system and is uncertain, therefore it is unsuitable for operations with specified time limits; ③ Due to the use of a simple first-match algorithm, it is easy to have a large number of memory fragments in the system, reducing memory usage efficiency and system performance. To address this, a combination of static and dynamic allocation methods is generally used in system design. That is, for important applications, the required memory is allocated during system initialization. During system operation, memory allocation/release is not performed again, thus avoiding the problems caused by memory allocation and release. Moreover, during system initialization, because there is no memory fragmentation, the demand for large memory blocks is easily met. For other applications, dynamic memory allocation is performed at runtime. Especially for some applications that require a large number of small, fixed-size memory blocks, a memory allocation scheme of allocating once and using multiple times can be used. The following section details this memory allocation scheme and its application scenarios. 4. Memory Allocation Scheme of Allocating Once and Using Multiple Times In embedded system design, there are often applications similar to in-memory databases. These applications are characterized by managing trees in memory, such as MAC address tables and VLAN tables in Ethernet switches, or routing tables in routers. These trees consist of many nodes of the same size. This allows for the allocation of a large buffer pool, such as an array containing multiple memory units, each for one node. A free list manages the free memory units in this array. Each time the program needs to allocate memory to create a new node, it takes one unit from the free list and gives it to the caller. When the program deletes a node and releases memory, it returns the released memory unit to the free list. If the list is empty, `malloc` is called again to allocate a large block of memory from system memory as a new buffer pool. The main advantages of this approach are as follows: ① It reduces the number of malloc/free calls, thereby reducing risk and fragmentation; ② Because retrieving a memory unit from the buffer pool is time-determined (except when the buffer pool is exhausted and malloc needs to be called again), it can be used in time-sensitive situations to ensure real-time performance; ③ It gives users the freedom to add debugging functions for memory allocation and deallocation, as well as statistical functions, to better monitor memory usage in the system. This approach inevitably involves the structure of a buffer pool. A typical buffer pool structure consists of the following parts: unit size, block size (or number of units), buffer pool pointer, free list, and parameters for statistics and debugging. Operations on the buffer pool include creating the buffer pool, releasing the buffer pool, allocating one memory unit from the buffer pool, and releasing the memory unit back to the buffer pool. Two examples are given below to illustrate the specific usage of this approach. 4.1 Memory Allocation in Intel Switch Drivers In switch solutions based on Intel switching chips, because software address learning is used, a lot of data needs to be maintained in memory, such as soft copies of the MAC address table, VLAN tables, static unicast address tables, and multicast address tables. These tables are composed of trees, each tree consisting of fixed-size nodes. Typically, each node is tens of bytes, and the number of nodes in each tree can grow, ranging from tens to a maximum of 16K nodes. Therefore, this scheme is very suitable, and the specific implementation is as follows: (1) Buffer pool structure BlockMemMgr typedef struct {MemSizeata_cell_size; /* Data unit size */ MemSizeblock_size; /* Block size */ /* The following variables are the maximum number of blocks that each manager can contain, such as 64 MAX_BLOCKS_OF_MEM_SIZE */ Unsigned shortblocks_being_used; /* Number of used blocks */ Void mem_ptr[PAX_BLOCKS_OF_MEM_SIZE]; /* Block array */ SLListfree_data_cells_list; /* Free list */ } BlockMemMgr; The parameters in the structure include: cell size, block size, number of used blocks, address of all blocks, and free list (singly linked list). (2) Buffer pool management functions ◆block_mem_create: Creates a block memory manager. Parameters include a memory pointer (NULL indicates self-allocation), block size, cell size, and returns a manager pointer. The process is as follows: ① Check parameter validity. ② Align the cell size to 4 bytes and calculate the number of cells in each block. Align the memory pointer to 4 bytes or allocate a memory pointer. ③ Initialize the structure BlockMemMgr, including cell size and block size. Set the pointer to the first memory block. If the memory is external, set the block used flag (used is 0), indicating that no new blocks can be added; otherwise, set the number of used blocks to 1. ④ Create a free list, add all cells in the block to the list, with the last cell at the front of the list. ⑤ Return BlockMemMgr. ◆block_mem_destroy: Deconstructs a block memory manager, releasing all memory it allocates. The caller is responsible for releasing external memory. The parameter is BlockMemMgr. Returns a success/failure flag. ① Parameter validity check. ② Delete the singly linked list (set the list pointer to NULL). ③ If the blocks are dynamically allocated, release them. ④ Release the structure BlockMemMgr. ◆block_malloc: Allocate 1 unit from the block memory manager. ⑤ Release the structure BlockMemMgr. ◆block_malloc: Allocate 1 unit from the block memory manager. The parameter is BlockMemMgr, and it returns a pointer to the data unit. ① Parameter validity check. ② Check if the free list is empty (whether it is NULL). If it is empty, check if a block can be dynamically allocated. If not, return failure; if a block can be dynamically allocated, allocate 1 block and perform the same operation as block_mem_create. ③ Allocate the first unit from the free list and return its pointer. Note that there is a trick here: when the data unit is free, it stores the node information of the free list, and after allocation, it stores the data content. ◆block_free: Release 1 data unit and return the block memory manager. Be careful not to free the same unit twice. The parameters are BlockMemMgr and the unit pointer. ① Parameter validity check. ② Address comparison to determine which block the data unit belongs to. ③ Determine if the content of the data unit is the free list node information (i.e., the address of a unit within the block), thus determining whether it is a double free. ④ Insert the data unit at the beginning of the free list. ⑤ Set the pointer referencing the unit to NULL. The memory management code follows these conventions: ① The managed memory is actual writable memory; ② Memory allocation is 4-byte or 32-bit aligned; ③ `block_malloc` and `block_free` are partially safe at interrupt level, unless there are no free cells in the block, requiring a new `malloc` call to allocate a new block (while `malloc` and `free` are not safe because they use semaphores and search algorithms, which can easily cause interrupt service routines to block). Of course, `block_mem_create` and `block_mem_destroy` must be called at the process level. 4.2 Memory Allocation in TMS TMS is a development package launched by Wind River for managed switches. It uses IDB to manage data for various protocols, such as STP and GVRP. To support IDB, it implements its own buffer pool management scheme, located in `bufPoolLib.c`. This scheme contains functions for buffer pool management, allowing the allocation of a fixed number and size of buffers from a single pool. By pre-allocating a fixed number of fixed-size buffers, it avoids the memory fragmentation and waste associated with repeated allocation/deallocation of small memory blocks. Since it allocates the buffer pool from a single block, it is also more space-efficient than performing an allocation for each buffer individually. The module adds a flag (MAGIC) to each buffer, which is checked during deallocation. The module provides users with the ability to define callback functions for allocation and deallocation operations. This enables automatic object creation and destructuring, while allowing objects composed of members allocated from multiple buffer pools to be deleted as a single entity. This is similar to automatic object construction and destructuring in C++, but in C and without the overhead of stack allocation. The module allows allocation of buffer pools from the stack (via `calloc`) or creation in user-allocated space. The module uses a singly linked list to maintain unallocated buffers but does not track allocated buffers. The module is not task-safe; users need to protect the buffer pool using signals. (1) Buffer pool structure typedef struct {ulong_tmagic; /* Special flag for consistency checks */ Boolean localAlloc; /* Whether memory was allocated when the buffer was created */ SL_LISTfreeList; /* Free list */ Voidstore; /* Memory pointer to the buffer */ STATUS(*createFn)(void*,ulong_targl); /* Callback function pointer when creating the buffer */ STATUS(*destroyFn)(void*,ulong_targl); /* Callback function pointer when releasing the buffer */ Ulong_targVal; /* Parameter of the callback function */ }buf_pool_t; The parameters in the structure include the check flag MAGIC, whether it is locally allocated, the free list, the memory pointer, the callback function pointer when creating the buffer pool, the callback function pointer when releasing, and the callback function parameter. (2) Related functions ◆BufPoolInitializeStorage: Allocates and initializes the storage area. The parameters include the memory address (if NULL, it will be allocated locally), buffer size, and number of buffers. ① Obtain the required memory size based on the buffer size and number. ② If the pointer is NULL, call `calloc` to allocate memory. Set the local allocation flag. ③ Initialize the memory to 0. ④ Initialize the pointer. The allocated memory block begins with a buffer pool structure `buf_pool_t`. The actual memory area follows immediately. `buf_pool_t` contains parameter check flags, whether it is allocated locally, the memory address, the allocation callback function, the deallocation callback function, and the callback function variable. At this time, only the memory pointer is set. ◆ `BufPoolCreate`: Creates the buffer pool. The parameters are the memory limit, buffer size and number, creation callback function, deallocation callback function, and callback function parameters. ① Size alignment. ② Call `bufPoolInitializeStorage` to initialize the memory area and the `buf_pool_t` structure. ③ Fill the `buf_pool_t` structure with the passed parameters. ④ Add the buffers to the free list, with the last buffer at the front. ◆BufPoolDestroy: Deletes the buffer pool. The parameter is a pointer to a buf_pool_t. ① Checks if the MAGIC field in the buffer pool structure is intact. ② If it's locally allocated, it reloads the memory area. ◆BufPoolAlloc: Allocates a buffer from the buffer pool. The parameter is a pointer to the buffer pool structure. If a free buffer exists, it's removed from the free list and provided to the caller, executing the creation callback function. If the callback function returns an error, the buffer is returned to the free list. ① Checks if the MAGIC flag in the buffer pool structure is intact. ② Removes the first node from the free list. ③ If the node is not empty, clears the node and calls the callback function with its address as the parameter. ④ If the callback function returns an error, the node is returned to the free list. ⑤ Returns the address of the obtained free buffer. ◆BufPoolFree: Returns the buffer to the buffer pool. If a callback function is defined, it will be called before returning the buffer. The parameters are the buffer pool structure and a pointer to the buffer. ① Checks if the buffer pool's MAGIC flag is intact. ② If a callback function is defined, call it. If an error is returned, set the error number. ③ Add the buffer to the head of the free list. Note two points about this function: ① Even if the callback function returns an error, the buffer is still returned. ② It does not check if the buffer has been freed twice, which differs from Intel's drivers. Additionally, TMS's buffer pool does not have a block mechanism, so there's no need to determine which cell belongs to which block, simplifying operations. 5. Summary Many embedded applications write their own memory management schemes based on the malloc/free provided by the RTOS. The purpose of writing such a memory management scheme is twofold: first, to reduce dependence on malloc/free, thereby avoiding memory fragmentation, timing uncertainties, etc.; second, to enhance the program's error-checking capabilities and correct memory usage errors. For the widespread database-type memory requirements in embedded systems, i.e., the requirement to allocate multiple fixed-size memory units, the "one-time allocation, multiple uses" scheme is undoubtedly a good solution. The two examples introduced in this article well demonstrate its superiority.