Analysis of ARM multi-core and MIPS multi-threaded embedded processor technologies
2026-04-06 04:13:29··#1
Building multi-core (homogeneous or heterogeneous) and multi-threaded technologies in embedded devices does indeed bring many benefits, especially in improving system performance. Although RISC embedded technology faces increasing challenges, it remains a viable solution by maintaining compatibility with existing embedded software resources, improving future applicability, and effectively enhancing the performance of new systems. The application determines whether multi-core or multi-threaded is used. While multi-core and multi-threaded technologies contribute to performance, performance is not absolutely related to the integration of these technologies; the primary reason for this is the requirements of the application environment. For example, while the SoC chip integrated into a mobile phone is part of a multi-core architecture, most SoC chips used in mobile phones are application processors, and their integrated cores are not entirely of the same architecture. Homogeneous multi-core architectures are actually very rare in practical applications of embedded systems. Multi-threaded processors play a crucial role in automotive electronics and embedded network environments. However, some manufacturers utilize multiple multi-threaded chips to create multi-core and multi-threaded computing architectures. In other words, it's not simply a matter of choosing one over the other; many manufacturers adopt a strategy of customizing or developing solutions based on the specific application requirements. This also means that when choosing the basic architecture for an embedded system, the processor itself is only one component of the application. Maximizing the required performance for the application requires various considerations depending on the product. Beyond mere technological competition, truly homogeneous multi-core architectures – ARM11 MPCore – are the leading technology in the field of embedded multi-core application processors. Although the company itself does not own a wafer fab and sells its processor architecture purely as IP, its correct positioning has led to a significant market position in just a few years, with the vast majority of handheld devices worldwide embedding ARM processor technology. Looking at its technological development history, the early ARM7 architecture could meet some audio encoding and decoding applications. With the addition of 16-bit saturation instructions and increased ARM9 core speed, it can not only handle audio encoding and decoding, but also MPEG-4 QCIF (1/4 CIF resolution) encoding at approximately 80 MHz and 15 frames per second. Adding speed and SIMD instructions to the ARM11 V6 instruction set architecture enables H.264 encoding at VGA resolution. Furthermore, with the latest Cortex A8 paired with its 64-bit SIMD-based Neon accelerator, MPEG-4 VGA encoding at 30 frames per second can be achieved, taking only half the cycle time of the ARM11. In practice, this requires approximately 300 MHz. To make these options more feasible for users, ARM is developing a parallel compiler prototype that extracts data parallelism and uses it with SIMD hardware. △Caption: Schematic diagram of the ARM11 MPCore structure. The ARM11 MPCore is based on the ARM11 core and belongs to the V6 instruction set architecture. Depending on the application requirements, MPCore can be configured with 1 to 4 processors, with a maximum performance of approximately 2600 Dhrystone MIPS, according to official specifications. MPCore is a standard homogeneous multi-core processor, consisting of four ARM11-based processor cores. The advantage of multi-core design is that it significantly improves processor performance without changing the frequency, thus promising excellent performance in multitasking applications, which is well-suited to the needs of future consumer electronics. For example, set-top boxes can record multiple TV channels while simultaneously receiving digital video-on-demand programs via the internet; in-car navigation systems can provide navigation while still having the capacity to stream various video entertainment to rear passengers. In these application environments, multi-core embedded processors demonstrate a significant performance advantage. According to the manufacturer's data, the MPCore multiprocessor supports up to four-way coherent symmetric multiprocessing (SMP), four-way asymmetric multiprocessing (AMP), and hybrid multiprocessor systems with both symmetric and asymmetric architectures. Its highly flexible design theoretically can meet the volatile computing performance requirements of various cross-domain applications, ensuring top-tier responsiveness and data throughput. However, the ARM11 MPCore was released as early as 2004 and officially entered the licensing business in 2005. To date, products using this processor are concentrated in home appliances and automotive electronics, but the number is not large. Is this because the industry's demand for processor computing power has not yet materialized? It is understood that in automotive electronics, the requirements for microprocessors in automotive applications are becoming increasingly demanding. While single-core processors were generally sufficient for most cars in the past, the increasing integration of electronic auxiliary devices into vehicles has led to more complex processing tasks, far exceeding the capabilities of traditional automotive microcontrollers. Therefore, it is expected that more and more automakers will adopt similar multi-core architectures in the coming years to achieve reasonable system responsiveness. In home appliance applications, however, few products require such complex cores. In the most prevalent audio-visual products, most manufacturers use dedicated hardware decoding circuits or DSPs for encoding and decoding; directly using multi-core processors for encoding and decoding is not particularly efficient. In mobile applications, power consumption remains a primary concern for mobile product manufacturers. Even though the ARM11 MPCore achieves extremely low power consumption when multiple cores are operating simultaneously, it still cannot compare to the single-core version, thus limiting its visibility in mobile applications. However, with Intel's push for MID (Mobile Internet Device), similar products are expected to present a significant opportunity for the ARM11 MPCore architecture. Even Stealey's next-generation 45nm product, Silverthorne, still consumes more than five times the power of MPCore (including the chipset's total power consumption), and is only a single-core architecture, making it significantly less flexible in application compared to MPCore. However, it's worth noting that Silverthorne comes with a vast amount of x86 software resources, in which ARM and other RISC-based processors are clearly at a disadvantage. For RISC-based MID-like products, ARM's latest processor architecture, the Cortex-A8, can also be considered. This processor is based on the latest ARM v7 architecture and integrates a 64-bit DSP processing unit, providing excellent acceleration capabilities for streaming applications, making it very suitable for multimedia and even gaming applications in MID-like handheld devices. Strictly speaking, the Cortex-A8 can also be considered a multi-core architecture, but its architecture differs from homogeneous cores like MPCore. Instead, it uses a heterogeneous multi-core processor consisting of a general-purpose processor core and a DSP core. It's believed that ARM borrowed a lot of application processor development experience from Texas Instruments in this regard. △Caption: Schematic diagram of Cortex-A8 structure. In fact, Nokia's N770/N800 already possessed all the functions of a MID, and was even thinner and lighter. However, unfortunately, with the original 1500mAh rechargeable battery, its continuous usage time was only 3.5 hours, not much different from typical UMPC products on the market, and slightly inferior to Intel's MID products. The power-saving advantage of using an ARM-based processor (the N800 uses an i.MX31 application processor based on the ARM1136J(F)-S core) was not highlighted here, although the standby time was slightly longer than that of a MID. MIPS's insistence on the multi-threaded approach can perhaps be seen as a clash of egos. MIPS adheres to a different technology development strategy than ARM. ARM develops Multi Processor (MP, multi-processor cores), while MIPS focuses on Multi-Thread (MT). From an application perspective, both MP and MT technologies aim to improve overall processor performance, reducing the processing time of any application's current software thread. However, these two technologies employ different hardware architectures to reduce processing time, resulting in different degrees of performance improvement for any given software program. This difference stems from a significant connection between the two IP vendors' development philosophies. MT technology emphasizes the efficient use of processing units and memory controllers, minimizing transistor usage while simultaneously improving performance. This contrasts sharply with the MP architecture's wasteful approach of cramming as many cores as the system's performance requirements. While MP offers broader application scope, it is somewhat extravagant. MT, on the other hand, demonstrates a more sophisticated balance between cost and performance. Many people compare multi-core (MP) and multi-threaded (MT) architectures, but to some extent, such comparisons are meaningless because their fundamental design concepts are vastly different, and their architectural choices cannot be generalized. Technically, to achieve hardware multitasking, both architectures are significantly more complex in terms of software optimization than single-core architectures. While MT architectures may be more complex in avoiding resource allocation conflicts between processing units and memory controllers, MP architectures also face similar challenges to some extent (especially in multi-core architectures sharing caches and memory controllers). Both instruction-level and thread-level multitasking differ greatly from the traditional single-core, single-threaded programming style and optimization methods. In typical MT architecture designs, a single processor core often encounters the problem of memory access speed not keeping up with the increasing processor frequency during operation. This leads to cache misses and prolonged idle periods in the execution pipeline. We all know that the fastest storage units in a system are the processor's cache, followed by the L1 cache, L2 cache, and finally main memory. The speed difference between these can be thousands of times. When the processor needs to retrieve instructions or data, it must first fetch them from the cache, store them in the cache for processing, and then store the final result back into the cache. When idle, the data is then filled back into main memory. When the processor issues an access request to the cache but finds the required data is not in the cache, it must spend a significant amount of time searching and retrieving it from main memory. This wasted time can be as high as tens of clock cycles. The processing pipeline idles while waiting for the data to be filled. If the concept of multi-threaded processing is used, other threads can be brought in to fill the idle state, resulting in a significant speed increase, not necessarily doubling, but potentially ranging from 20% to 40%. To achieve this, only a 15% increase in the number of transistors is needed. Considering that the performance increase of a typical single-core processor converted to a dual-core processor is around 40% to 70%, the near doubling of the transistor count demonstrates the high efficiency of MIPS's MT technology. However, MT technology has a serious drawback: excessively frequent context switching during multi-threaded processing can lead to significant performance degradation. △Caption: Schematic diagram of the MIPS 74K processor architecture. MIPS has a large product line, including the single-threaded 24K and 74K series, and the multi-threaded 34K series. The 74K, released in June of this year, operates at frequencies exceeding 1GHz using a 65nm process. It employs a general-purpose processor with a DSP core design, but its overall performance and power consumption are slightly inferior to the similarly designed ARM Cortex-A8. The 34K series, the star of multi-threaded processors, allows for the configuration of one or two Virtual Processing Components (VPEs) and up to five Thread Contents (TCs), providing ample configurability. Essentially, two VPEs simulate two cores, enabling the 34K core to run two independent operating systems simultaneously, or a dual-socket symmetric multiprocessor operating system. The MIPS32 34Kc core uses a 90nm process and has a worst-case operating frequency of 500MHz. The core size is 2.1mm², and the core power consumption is 0.56mW/[email protected]. Currently, this series includes the 34Kc, 34Kf, 34Kc Pro, and 34Kf Pro. These cores are fully compliant with the IEEE 754 standard for hardware floating-point operations. The 34Kc Pro and 34Kf Pro cores feature CorExtend functionality, allowing SoC developers to expand the instruction set. △Caption: Schematic diagram of the MIPS 34K processor architecture. According to MIPS' own estimates, compared to the 24K series products, the 34K, with a configuration of 2 VPEs and 2 TCs, can improve performance by more than 60% compared to the 24K processor. The chip area increases by approximately 14%, and the cache error rate due to multi-threaded operations increases from 4.41% to 5.16%, which is within an acceptable range. However, compared to the single-core 74K, the 34K is less suitable for intensive computing environments such as network or multimedia streaming, and the increase in VPE and TC units will also increase the chip area. Although the limitations of MT technology make it unsuitable for multimedia encoding and decoding applications, in automotive electronics, some manufacturers have successfully used two 34K processors to form a dual-core multi-threaded processor, providing excellent performance. With this successful precedent, we can predict that more MIPS solutions combining multi-core and multi-threading will emerge in the future. However, how much cost advantage will remain is a question for solution providers to ponder.