Development Trends and Comparative Analysis of Embedded System Architecture

Embedded systems are now widely used in various fields and are closely related to our lives, from small handheld digital products to automobiles and spacecraft. When we mention embedded systems, we quickly think of microcontrollers (MCUs). Indeed, MCUs are the most basic and commonly used embedded systems. However, the application of other embedded systems such as FPGAs, ARM, DSPs, and MIPS is becoming increasingly widespread. Embedded systems combined with analog circuits or other functional circuits to form SoCs (System on Chip) or SiPs (System in Package) are increasingly used in complex products such as mobile phones and set-top boxes. In general, the development of embedded systems exhibits the following characteristics: • Transition from 8-bit processing to 32-bit • Transition from single-core to multi-core • Development towards networking capabilities • Simultaneous development of MCUs, FPGAs, ARM, DSPs, etc. • Diversification of embedded operating systems. All embedded processors are based on a certain architecture, namely IP cores (Intellectual Property). Many manufacturers produce processors, but only a handful possess IP cores. Having one's own IP cores allows one to dominate the market simply by selling them. Embedded system architectures are divided into proprietary architectures and standard architectures. In MCU (microcontroller) products, companies like Renesas, Freescale, and NEC have their own proprietary IP cores, while other embedded processors are based on standard architectures. This article only discusses embedded systems with standard architectures. Standard embedded system architectures have two main systems, with the so-called RISC (Reduced Instruction Set Computer) processor currently dominating. The RISC ecosystem is very broad, including ARM, MIPS, PowerPC, ARC, Tensilica, and many others. However, although these processors all belong to the RISC system, they differ in instruction set design and processing unit structure, making them completely incompatible. Software developed for a specific platform cannot be directly used on another hardware platform and must be recompiled. Secondly, there's the CISC (Complex Instruction Set Computer) processor architecture. Intel's x86 processors, for example, belong to the CISC architecture. CISC is actually a very inefficient architecture, with its instruction set structure burdened with too many features, prioritizing comprehensiveness and leading to a significant increase in chip complexity. x86 processors used in embedded systems in the past were mostly older generation products; for example, the Pentium 3 processor, which was discontinued from the personal computer market several years ago, is still commonly seen in industrial computers. Because this generation of products represents the sweet spot of the x86 architecture in terms of performance-to-power ratio, and has been proven stable over a long period in the market, it is often used in applications where performance requirements are not high but stability is crucial, such as industrial control equipment. The RISC family includes ARM processors . ARM Inc. was founded in Cambridge, England in 1991 and primarily licenses chip design technology. Currently, processors using ARM technology's intellectual property (IP) core, commonly known as ARM processors, are ubiquitous in various product markets, including industrial control, consumer electronics, communication systems, network systems, and wireless systems. ARM-based processors account for over 75% of the 32-bit RISC microprocessor market. ARM technology has not only gradually permeated all aspects of our lives, but we can even say that ARM has become an indispensable part of our living environment. The most common ARM processor architectures on the market are ARM7, ARM9, and ARM11. The newly launched Cortex series is still under development and verification, and no related products have been released yet. ARM was also the first manufacturer to introduce a multi-core architecture in embedded processors. ARM's first multi-core architecture was the ARM11 MPCore, built upon the existing ARM11 processor core. The ARM11 core was released in October 2002. To further improve performance, its pipeline length was extended to eight stages, and the processing units were increased to eight units: prefetch, decode, transmit, translation/MAC1, execution/MAC2, memory access/MAC3, and write. It belongs to the ARM V6 instruction set architecture. The ARM11 used the then-state-leading 0.13μm manufacturing process, and its operating frequency could reach 500MHz to 700MHz. If a 90nm process were used, the ARM11 core could easily reach frequencies exceeding 1GHz—a remarkable achievement for embedded processors. However, 1GHz was clearly not a balanced setting in the ARM11 architecture, so almost no manufacturers released ARM11 architecture processors reaching 1GHz. The ARM11's logic core also underwent numerous improvements, the most important of which was the "predictive function for static/dynamic combination translation." The ARM11's execution unit includes a 64-bit, four-state address translation buffer, primarily used to store recently used translation addresses. When the dynamic translation prediction mechanism fails to find the correct address in the address buffer, the static translation prediction function immediately takes over. In actual tests, the accuracy rate of dynamic prediction alone is 88%, while the accuracy rate of static prediction alone is only 77%, and the combined static/dynamic prediction mechanism of ARM11 achieves a high accuracy rate of 92%. To address the increased power consumption caused by high clock speeds, ARM11 employs a smart power management technology called IEM (Intelligent Energy Manager). This technology dynamically adjusts the processor voltage according to the task load, thereby effectively reducing its power consumption. These improvements further enhance the power efficiency of ARM11, consuming an average of only 0.6mW per MHz (0.8mW with cache), and achieving a maximum performance of 660 Dhrystone MIPS, far exceeding the previous generation. As for the ARM11 MPCore, it shares the same V6 instruction set architecture as ARM11. Depending on the application requirements, MPCore can be configured with combinations of 1 to 4 processors. According to official data, its maximum performance can reach approximately 2600 Dhrystone MIPS. MPCore is a standard homogeneous multi-core processor, consisting of four processor cores based on the ARM11 architecture. The advantage of multi-core design is that it significantly improves processor performance without changing the frequency, thus promising excellent performance in multitasking applications, which is well-suited to the needs of future home consumer electronics. For example, set-top boxes can record multiple TV channels while simultaneously watching digital video-on-demand programs via the internet; in-car navigation systems can provide navigation while still having the capacity to play various video streams to rear passengers. The MIPS processor, part of the RISC family, is a long-established RISC processor architecture from the United States. Its architecture design, like the American character, is quite ambitious and idealistic. The origins of the MIPS architecture can be traced back to the 1980s, when Stanford University and Berkeley University simultaneously began research on RISC architecture processors. Founded in 1984, MIPS Technologies launched its first processor, the R2000, in 1986. It was acquired by SGI in 1992. However, following the decline of the MIPS architecture in the desktop market, it separated from SGI in 1998, becoming MIPS Technologies. In 1999, it restructured its strategy, shifting its market focus to embedded systems and unifying its processor architecture into two families: 32-bit and 64-bit, with technology licensing as its primary profit model. While MIPS has a very small presence in mobile phones, it has achieved considerable success in general digital consumer electronics, VoIP, personal entertainment, communications, and business applications. However, its market share has slightly declined in recent years due to the rise of other IP licensing companies. MIPS is most widely used in home audio-visual appliances (including set-top boxes), networking products, and automotive electronics. MIPS' core technology emphasizes multiple issue processing capabilities (also commonly referred to as multi-issue core technology). Generally, multi-core and multi-issue are two non-exclusive architectures that can be combined. However, in the embedded field, the two major processor IP vendors, ARM and MIPS, have different attitudes towards these two architectures, resulting in a competition between them in the embedded market. MIPS's multi-issue architecture is the MIPS34K series, a 32-bit processor. Architecturally, multi-issue technology is merely a compromise to minimize idle processing units; it involves partitioning idle processing units in the processor into virtual cores to improve unit utilization. Technically, to achieve hardware multiprocessing, both multi-core and multi-issue architectures are significantly more complex in terms of software optimization than single-core architectures. The 34K core can execute existing symmetric dual-path SMP operating systems (OSes) and application software. Through active management by the operating system, existing application software can also make good use of multi-issue processing capabilities. It can also be applied in environments where multiple execution threads each have different roles (AMP or asymmetric multiprocessing). Furthermore, the 34K cores can be configured with one or two Virtual Processing Components (VPEs) and up to five thread contents, providing considerable design flexibility. MIPS's multi-issue architecture has redundant hardware caches to record execution states during task switching, avoiding delays caused by reloading instructions or re-executing certain parts of the work during task switching. However, even with the ability to execute multiple tasks simultaneously, a multi-issue processor is essentially still a single-core processor. When a single execution faces high load, the processing time of other executions may be compressed or even paused. During execution, processes such as memory locking, unlocking, and synchronization also occur in multi-issue architectures. Therefore, in extreme cases, the performance of multi-issue is significantly inferior to that of a native multi-core architecture (comparing two threads to two cores). However, the advantage of multi-issue architectures lies in their high hardware efficiency and theoretically, effective power consumption reduction. Some IC design companies have launched parallel architecture multi-core systems based on the MIPS architecture, forming application architectures that combine multi-core and multi-issue capabilities. It is believed that in the future, this architecture will be incorporated into the native MIPS architecture to handle more complex applications. PowerPC, a member of the RISC family, is a RISC multi-issue architecture. In the 1990s, IBM, Apple, and Motorola successfully developed PowerPC chips and manufactured multiprocessor computers based on PowerPC. The PowerPC architecture is characterized by its scalability and flexibility. The first generation of PowerPC used a 0.6-micron manufacturing process, achieving a transistor integration density of 3 million transistors per chip. Motorola integrated the PowerPC core into its System-on-a-Chip (SoC), creating dozens of models of embedded communication processors within the Power QUICC (Quad Integrated Communications Controller), Power QUICC II, and Power QUICC III families. Motorola offers nearly a dozen embedded processor chips based on the PowerPC architecture, including the MPC505, 821, 850, 860, 8240, 8245, 8260, and 8560. Among these, the MPC860 is a typical product of the Power QUICC series, the MPC8260 is a typical product of the Power QUICC II series, and the MPC8560 is a typical product of the Power QUICC III series. The Power QUICC series microprocessors generally consist of three functional modules: the embedded PowerPC core (EMPCC), the system interface unit (SIU), and the communication processor (CPM) module. All three modules have a 32-bit internal bus. In addition, the Power QUICC integrates a 32-bit RISC core. The PowerPC core primarily executes high-level code, while the RISC handles low-level communication functions. The two processor cores cooperate through up to 8KB of internal dual-port RAM to jointly complete the powerful communication control and processing functions of the MPC854. The CPM is built around a RISC controller. In addition to the RISC controller, it includes seven Serial DMA (SDMA) channels, two Serial Communication Controllers (SCCs), one Universal Serial Bus (USB) channel, two Serial Management Controllers (SMCs), one I2C interface, and one Serial Peripheral Interface (SPI). It can be flexibly programmed to support Ethernet, USB, T1/E1, ATM, and various communication protocols such as UART and HDLC. Power QUICC II can be considered the second generation of Power QUICC, offering higher performance in terms of flexibility, scalability, and integration. Power QUICC 11 also integrates an embedded PowerPC core and a communication processing module (CPM). This dual-processor architecture is more power-efficient than traditional architectures because the CPM handles the peripheral interface tasks of the embedded PowerPC core. The CPM alternately supports three Fast Serial Communication Controllers (FCCs), two Multi-Channel Controllers (MCCs), four Serial Communication Controllers (SCCs), two Serial Management Controllers (SMCs), one Serial Peripheral Interface (SPI), and one I2C interface. The integration of the embedded PowerPC core and Communication Processing Module (CPM), along with other features and performance improvements of Power QUICC II, shortens the development cycle for engineers in networking and communication products. Compared to Power QUICC II, Power QUICC III offers higher integration, more powerful functionality, and better performance enhancement mechanisms. The CPM in Power QUICC III operates at 333MHz, a 66% faster speed than the 200MHz CPM in Power QUICC II, while maintaining backward compatibility with earlier products. This allows customers to maximize the continuation of their existing software investments, simplify future system upgrades, and significantly reduce development cycles. Power QUICC III, through the scalability and customization capabilities of microcode, enables customers to develop unique products for different application areas. This microcode reuse capability, present since Power QUICC II, has become a key design consideration for simplifying and reducing upgrade costs. PowerPC is generally used in servers, high-performance dedicated computers, and game consoles. The ARC Architecture within the RISC Family Compared to other RISC processor technologies, ARC's configurable architecture has secured its place in the diverse range of chip applications. Its configurable architecture focuses on different applications requiring different functionalities. While a fixed chip architecture might be comprehensive, some functions may never be used after being designed into a product. Even if unused, developers still incur costs for these unnecessary functions, resulting in waste. Advances in manufacturing processes and the miniaturization of chips allow semiconductor manufacturers to cut more chips from the same wafer size. Standardization helps streamline the chip design process; processors designed with a single general-purpose IP can be used for various purposes, eliminating the need for separate production capacity for specific models or functions. Mass production also helps reduce the cost per chip, a common phenomenon in embedded processors today. The ARC design concept aims to minimize the cost per chip, tailoring the design to specific needs. This requires specific EDA software during the design phase. ARC recently launched a multimedia application acceleration processor based on the 700 series, integrating the ARC 700 general-purpose processing core and a high-speed SIMD processing unit. It can easily perform H.264 encoding and decoding processing for Blu-ray discs at low clock speeds. This architecture is called VideoSubsystem. Essentially, this application processor can handle general-purpose computing tasks, but it can also interoperate with other architectures such as ARM or MIPS to meet application compatibility and accelerate audio and video data streams. The Tensilica architecture of the RISC family: Tensilica's Xtensa processor is a freely configurable, flexibly expandable, and automatically synthesized processor core. Xtensa was the first microprocessor designed specifically for embedded single-chip systems. To allow system design engineers to flexibly plan and execute various application functions of a single-chip system, Xtensa was designed from the early stages of development as a freely assembleable architecture; therefore, we define its architecture as an adjustable design. Tensilica's main product line is Xtensa, which allows system design engineers to select the required unit architecture, add their own new instructions and hardware execution units, and design processor cores that are several times more powerful than other traditional methods. The Xtensa generator can automatically and efficiently generate a complete set of software tools, including an operating system, for each processor's specific combination. Xtensa is a 32-bit processor architecture characterized by a streamlined and high-performance 16-bit instruction set specifically designed for embedded systems. Its basic architecture has 80 RISC instructions, including a 32-bit ALU, six special-function registers, and 32 or 64 general-purpose 32-bit registers. These 32-bit registers all have channels for accelerated execution. The Xtensa processor's instructions are quite streamlined, allowing system designers to reduce program code length, thereby increasing instruction density and reducing power consumption. This results in significant cost reduction compared to highly integrated single-chip ASICs. Xtensa's instruction set architecture includes efficient branching instructions, such as integrated compare-and-divide loops, zero-overhead loops, and binary processing, including funnel switching and field segmentation operations. Floating-point units and vector DSP units are two optional processing units on the Xtensa architecture that can enhance performance in specific applications. The x86 family of CISC processors has a long history of application in embedded systems. For example, Intel's Pentium 3 era processors and chipsets are still widely used in industrial PCs. With the two major x86 manufacturers abandoning their RISC product lines and actively planning mobile applications, x86's entry into the consumer electronics embedded market is no longer just a rumor. Of course, x86 processors generally suffer from high power consumption and a large number of chips, making them unsuitable for the streamlined and power-efficient embedded architectures required. However, with development, all of this has fundamentally changed. Although the Pentium 4 was a relatively unsuccessful product for Intel, the Pentium 3 remains a market favorite, and even Intel itself is reluctant to abandon the Pentium 3 microarchitecture. Now, after several refurbishments and modifications, even the latest quad-core products still retain traces of the Pentium 3. Selling older architecture products is actually quite beneficial for Intel, as the older architecture has been proven over time, does not require redesign, and has very low production costs. Process improvements can further increase chip production. However, Intel isn't the only company with older architecture products; AMD also uses the same approach with its Athlon XP processors. Intel, however, is determined to make it difficult for competitors to catch up, and therefore has planned a series of embedded processors primarily for mobile applications. In the past, Intel's x86 product planning almost never involved mobile communication applications. Even its much-hyped but ultimately underwhelming UMPC products lacked mobile communication capabilities, referring to features like 3G and 3.5G. When Intel's flagship WiMAX was officially included as part of the 3G standard, Intel reconsidered its mobile application products. In recent technology demonstrations, even the MID (Mobile Intel Device) devices, which most closely resemble mobile phone designs, were positioned as mobile internet access tools, not mobile communication systems. However, according to Intel's latest plans, the MID platform has shifted from simply providing mobile internet access to entering the same market as existing devices like BlackBerry and iPhone. BlackBerry boasts powerful internet communication capabilities, while iPhone offers robust multimedia capabilities. However, Intel's MID platform is essentially a miniature x86 computer system with comprehensive functionality and x86 software compatibility resources that are difficult for processor architectures like ARM and MIPS to provide. Adopting it for mobile communications is simply adding software modules to the existing hardware plan. Traditional mobile communication product manufacturers have invested heavily in related hardware and software; whether they will adopt the Intel architecture remains to be seen. Of course, if Nokia and Motorola also adopt this architecture, it will greatly boost their promotion efforts. Intel's latest processor for mobile applications is Stealey, currently offering two models: the 600MHz A100 and the 800MHz A110. Theoretically, Stealey is simply a super-simplified and clock-reduced version of the Centrino architecture's grandfather-level Dothan processor (both based on the Pentium 3 architecture), primarily benefiting from the 90nm process. Its architectural technology doesn't offer many unique features. The next-generation Silverthorne, however, directly uses a 45nm process and adds 64-bit processing capabilities, making it a more substantial product. However, currently, Intel's mobile application platform is not performing well. Excessive power consumption and temperature remain concerns for its application in mobile devices. Silverthorne must address these two issues; otherwise, its application in MID products may follow in the footsteps of UMPC, becoming another case of being praised but not commercially successful. Among the surviving x86 processor manufacturers in the CISC family , besides Intel, the world's largest semiconductor manufacturer, the other two are struggling, especially VIA (Via Technologies) from Taiwan. The company has always faced pressure from major manufacturers in its processor product line. While VIA's past series of low-power processors had lower performance, their power consumption control was excellent, far surpassing that of Intel and AMD. Now, with the global trend shifting from performance-oriented to environmentally friendly, VIA has finally achieved success. Besides gaining popularity in low-priced PCs, it also provides excellent solutions in UMPC and embedded systems. VIA's mainstream product line is the C7-M processor, which comes in two models: a standard version and an Ultra Low Voltage version. The standard C7-M models feature 1.5GHz/400MHz FSB, 1.6GHz/533MHz FSB, 1.867GHz/533MHz FSB, and a maximum speed of 2GHz/533MHz FSB, with voltages ranging from 1.004V to 1.148V and maximum TDP from 12W to 20W. In P-State mode, the voltage drops to 0.844V, and the TDP is only 5W. The Ultra Low Voltage version of the C7-M processor features 1GHz/400MHz FSB, 1.2GHz/400MHz FSB, and 1.5GHz/400MHz FSB. The ULV version only requires a voltage of 0.908V to 0.956V, and its maximum TDP is around 5W to 7W. Among the ULV processors, there's also a Super ULV C7-M 1GHz, model C7-M ULV 779, with an operating voltage as low as 0.796V and a maximum power consumption of only 3.5W. These features make it a force to be reckoned with in low-cost computers and embedded applications. AMD , part of the CISC family , is the only x86 processor manufacturer in the mainstream market that can rival Intel. However, after acquiring ATI, its performance has been lackluster. Currently, its mainstream product line (including CPUs and GPUs) is essentially on the defensive. After all, AMD possesses the industry's best technological balance, boasting advanced processor technology, being the second-leading GPU technology provider, offering high-performance motherboard chips, and owning its own foundries. While not as dominant as Intel, it still holds a significant market share. In embedded applications, AMD previously licensed its IP to MIPS to develop the Alchemy product line, a direct challenge to Intel's previous ARM Xscale architecture. However, this product failed to capture market attention and remained weak in applications. AMD later abandoned this model and began to focus on embedded applications using its own x86 processors. AMD's embedded x86 processor product line is Geode, a highly integrated SoC product, but it suffers from low speed, poor performance, and functionality that doesn't surpass VIA's products, resulting in an awkward positioning. An x86 processing core developed for mobile platforms, codenamed "Botcat," is currently incomplete in information. Therefore, we can only speculate based on the timeline. This processor may use a simplified, downclocked version of the K8 core, but its release this year is significantly later than its competitors. If it integrates the K10 core, it may not be available until after 2009. However, after AMD acquired ATI, it was able to bring the Fusion concept to consumers. Fusion combines the strengths of both AMD and ATI, possessing advanced processor cores, high-performance graphics cores, and I/O control capabilities. From low-power embedded applications to high-power performance products, Fusion products cover a wide range of applications. However, Fusion will not enter the market until 2009 at the earliest.

Development Trends and Comparative Analysis of Embedded System Architecture

Read next

CATDOLL Milana Soft Silicone Head

CATDOLL 130CM Laura

CATDOLL 123CM Momoko (TPE Body with Soft Silicone Head)

CATDOLL 146CM Ya TPE (Customer Photos)