AI Chip-Making Strategies: A Look at Baidu Kunlun

It can be said that the chip-making boom coincided almost exactly with the explosion of artificial intelligence. This wave of AI growth was driven by the rise of deep learning algorithms, the foundation of which requires more data for training and higher computing power. As traditional chips gradually failed to meet the computing power demands of the internet's explosive growth, internet companies with advanced algorithms and powerful computing capabilities became the driving force behind chip self-development. Domestic and foreign companies almost simultaneously embarked on this new chapter. China's actions in entering its market will play a crucial role in innovation within the industry. According to one study, China now accounts for 60% of global semiconductor consumption. According to *International Business Strategy*, in 2019, China's semiconductor industry sales reached $212.2 billion, North America $59.5 billion, the rest of the world $48.8 billion, Europe $41.8 billion, and Japan $38.7 billion.

Figure: Global Semiconductor Consumption by Region in 2019 (Billions) Currently, domestically produced AI chips are at a critical juncture, characterized by: 1. A vast market accommodating many major players; 2. The early stages of explosive growth, with large-scale applications yet to materialize; 3. Scattered and complex application scenarios requiring customization; 4. Individual chips are insufficient, necessitating supporting solutions. Against this backdrop, what opportunities can Baidu, the "first AI stock," seize in chip manufacturing, and how competitive will it be?

More than two years after its release, how much has Kunlun achieved?

The most direct way to measure the quality and success of a chip is by its shipment volume. Baidu Kunlun announced in 2018, and in December 2019, Baidu and Samsung announced the completion of Baidu's first cloud-based product, the first-generation Kunlun AI chip for computing and edge computing. To date, 20,000 units of the mass-produced Baidu Kunlun 1 have been deployed in Baidu's search engine and cloud computing users. Compared to other domestic internet chip manufacturers, the shipment volume of Baidu Kunlun 1 is quite impressive. From the perspective of combining technology with application scenarios, the new wave of emerging AI chips needs to target different types of artificial intelligence applications and scenarios. The requirements for chips go beyond simply being suitable for deep learning; they need to balance computing power, energy consumption, and flexibility.

Cloud computing giants are increasingly deploying cloud computing + FPGA chips, primarily because FPGAs, as programmable chips, are ideally suited for deployment in cloud computing platforms that provide virtualization services. The flexibility of FPGAs empowers cloud service providers to adjust the supply of FPGA-accelerated services according to market demand.

Computational speed and power consumption are core performance metrics for evaluating a chip. Kunlun chips are positioned as general-purpose AI chips, aiming to provide high-performance, low-cost, and highly flexible AI solutions. Notably, Kunlun chips can perform both training and inference, meeting the high processing demands of AI for use in cloud and edge environments, including data centers, public clouds, and autonomous vehicles. It is understood that Kunlun 2 will be built using a 7nm process and will enter mass production in 2021, offering three times the performance of its predecessor.

A brokerage report indicates that "such large-scale, cloud-based AI chips with high computing power have a very high technological threshold. Only Baidu, Huawei, and Cambricon can produce these products." Prior to the Kunlun chip's emergence, Baidu had already deployed over 10,000 FPGA accelerators on a large scale in its internal data centers and autonomous driving systems in 2017, laying the initial foundation for cross-industry and cross-scenario testing of the Kunlun chip. Subsequently, on the deployed and operational intelligent quality inspection equipment at Weiyi Intelligent Manufacturing, Baidu Smart Cloud delivered a Baidu Cloud integrated quality inspection machine equipped with the Baidu Kunlun chip to Weiyi Intelligent Manufacturing in a fully integrated manner.

Hardware alone is far from enough; a holistic solution is also crucial for commercialization. Baidu has proposed an AI-Native cloud computing architecture, which supports intelligent applications across industries through a cloud-integrated, end-to-end approach, encompassing infrastructure such as AI computing clusters and AI chips, engineering platforms like PaddlePaddle and cloud-native technologies, and application development platforms such as video cloud and blockchain.

Of course, accumulating intangible assets is also crucial for technology-driven companies; Qualcomm, a mobile chip giant, has dominated the global market solely through patent fees. In terms of AI patent applications and licensing, Baidu has ranked first in China for three consecutive years, and its AI open platform has gathered 2.65 million developers. Baidu's market share in other regions is not large, and competing with established rivals will be extremely difficult. Nevertheless, in the current environment, the progress of Kunlun chips' mass production and delivery highlights the overall development momentum of AI in China and represents the determination of Chinese companies to establish global leadership in this emerging field.

In fact, China has never lagged behind in the field of AI. According to a 222-page "2021 AI Index Report" released by Stanford University, in 2020, China's citation frequency in global AI journals surpassed that of the United States for the first time. In terms of the number of publications, China briefly surpassed the United States in the total number of AI journal publications as early as 2004, before regaining its leading position in 2017.

Customization, Modification, Secondary Development

As early as 2011, Baidu launched its FPGA AI Accelerator project. By 2015, over 5,000 FPGAs had been deployed, and in 2017, it became the industry leader with over 12,000 deployed chips. In 2018, Baidu released its self-developed AI chip, Baidu Kunlun. It successfully completed tape-out in 2019, and in 2020, the first generation of Kunlun began mass production and large-scale deployment. We won't go into too much detail about the Kunlun chip, but it's worth noting that while GPUs are often a crucial tool for building AI chips, Baidu's approach was based on FPGAs from the outset. The defining characteristic of FPGAs is their programmability, allowing users of the Kunlun chip to fully customize, modify, and further develop it according to their specific application scenarios.

Due to the dispersed and complex nature of AI applications, customization is particularly important. FPGAs, as programmable chips, are well-suited for deployment in cloud computing platforms that provide virtualization services. The flexibility of FPGAs, combined with the Kunlun chip, allows users to customize, modify, and further develop their chips according to their specific needs and application scenarios, enabling faster market penetration and subsequent product iterations. In terms of performance, Baidu Kunlun chips are up to three times stronger than NVIDIA T4. Looking at the global AI chip landscape, according to a statistical study from MIT Lincoln Laboratory's Supercomputing Center, "Survey and Benchmarking of Machine Learning Accelerators," Kunlun chips also rank highly in terms of performance and power distribution among publicly announced AI accelerators and processors worldwide (see the chart below). The chart below shows the capabilities of some recently publicly released AI processors (as of May 2019), listing the peak performance and power consumption of the chips.

Publicly announced performance and power distribution of AI accelerators and processors (Source: MIT research "Survey and Benchmarking of Machine Learning Accelerators"). Note: The x-axis represents peak power, and the y-axis represents peak gigabytes of operations per second. The computational precision of processing power is described by the geometry employed; the precision ranges from a single bit (int1) to a single byte (int8), and from 4 bytes (float 32) to 8 bytes (float 64). The shape factor is described by color, which is important for showing how much power is consumed, and also for how much computation can be loaded onto a single chip, a single PCI card, and the entire system. Blue represents only the performance and power consumption of a single chip. Orange represents the performance and power consumption of a single chip (note that they are all in the 200-300W range). Green represents the performance and capacity of the entire system—in this case, single-node desktop and server systems.

It's widely acknowledged in the industry that "AI applications are fragmented and difficult to implement." Amidst the massive influx of AI chip manufacturers, many companies have disappeared after the bubble burst, leaving only a handful of a dozen or so. So, what can Baidu rely on?

Chip makers need not only hardware expertise but also AI algorithm software knowledge. Baidu Kunlun, a core component of Baidu's AI platform, natively supports the open-source deep learning framework PaddlePaddle, Baidu Machine Learning Platform (BML), and various vertical AI capability engines. Furthermore, Kunlun supports not only mainstream global CPUs and operating systems, and deep learning frameworks like PyTorch and TensorFlow, but also collaborates closely with domestic manufacturers to support Chinese CPUs such as Phytium, Shenwei, and Hygon, as well as domestic operating systems like Kylin, Deepin, and UnionTech. Moreover, ecosystem capabilities are crucial in the chip industry, and integrating them into one's own products allows for a significant cost-performance advantage.

We can actually draw inspiration from Huawei's Kirin chip in this regard. Because of the continuous trial and error and support from mobile terminals, the Kirin chip has been able to innovate through iteration. The two complement each other, ultimately creating a remarkable story in mobile phone history. Baidu is similar. Compared to other AI chip companies that operate independently, it has the ecosystem advantage of a large company. Baidu doesn't need to worry like typical AI chip manufacturers: "What kind of chip should I make? Will the chip meet market demand? And in what scenarios will it be used?" Because Baidu has too many application scenarios: smart speakers, Apollo autonomous driving, intelligent cloud, etc., all of which form a large cycle for Baidu's overall business.

Tesla is in a similar situation. Before developing its own FSD chip, Tesla relied on Nvidia chips, which not only failed to meet Tesla's performance requirements but were also prohibitively expensive, leaving Tesla with no bargaining power. Subsequently, Tesla's self-developed chips achieved the same or even better performance than Nvidia's, while also controlling the costs.

In its chip-making strategy, Baidu follows a similar approach to current international internet giants such as Amazon, Google, and Microsoft: primarily producing chips for its own use, but also focusing on building an ecosystem for chips, which in turn supports the ecosystem. With the future boom in smart cars and the significant demand from the cloud computing and IoT markets, Baidu may be propelled to the forefront of history.

AI Chip-Making Strategies: A Look at Baidu Kunlun

Read next

CATDOLL 115CM Emelie TPE

Measures for the sustainable development of concrete construction machinery

CATDOLL Dolly Hard Silicone Head

CATDOLL 115CM Kiki TPE