The technical principles and architecture of AI chips

The recent National Committee of the Chinese People's Political Consultative Conference (CPPCC) also addressed this issue—"Artificial intelligence has become a crucial area of competition in science and technology among nations. We must delve deeper into the computing power potential of domestically produced AI chips, accelerate the development of domestically produced operating systems, solidify the computing power foundation for the development of artificial intelligence, and help new productive forces accelerate their growth." Therefore, let's discuss AI chips.

As an integrated circuit designed and manufactured specifically for AI computing needs, AI chips have not only revolutionized the way computers process information, but have also played a crucial role in many cutting-edge fields such as image recognition, speech recognition, natural language processing, and autonomous driving.

Basic Concepts of AI Chips

AI chips, also known as AI accelerators or intelligent chips, are specialized microprocessors designed specifically for efficiently running artificial intelligence algorithms. Unlike traditional general-purpose processors such as CPUs and GPUs, AI chips focus on solving large-scale parallel computing problems in AI applications, especially intensive mathematical operations for neural network models, such as matrix multiplication, convolution operations, and activation function calculations. This highly customized design significantly improves computational efficiency, reduces energy consumption, and enables real-time response and high-performance inference capabilities.

The technical principles and architecture of AI chips

The core principle of AI chips based on artificial neural network models is based on artificial neural networks, where the processing units inside the chip simulate the working mechanism of biological neurons. Each processing unit can independently perform complex mathematical operations, such as multiplying the input signal by weights and accumulating the results to form the activated output of the neuron. The activation function determines how the signal is transformed into a meaningful result, and it is an indispensable part of the AI chip.

AI chip hardware architectures vary widely and can be categorized as follows based on their design goals and application scenarios:

GPU (Graphics Processing Unit): Originally used primarily for graphics rendering, GPUs have become widely used for training large-scale deep learning models due to their strong parallel computing capabilities, especially adept at handling floating-point intensive computing tasks.

FPGA (Field Programmable Gate Array): FPGA has highly flexible programmability and can be quickly reconfigured at the hardware level to adapt to different AI algorithms, making it suitable for early development stages and dynamic workload scenarios.

ASIC (Application-Specific Integrated Circuit): ASIC is a chip customized for specific AI tasks. Compared to GPUs and FPGAs, it has higher computational efficiency and lower energy consumption in specific applications, but lacks versatility.

TPU (Tensor Processing Unit): Google's TPU is an ASIC instance specifically designed for machine learning tasks, focusing on efficient matrix operations, and is especially suitable for deep learning models under the TensorFlow framework.

Classification and Market Applications of AI Chips

AI chips are widely used in various fields, including but not limited to:

1. Autonomous driving : AI chips can process data collected by vehicle sensors in real time, enabling accurate navigation and decision-making, and improving the safety and reliability of autonomous driving.

2. Intelligent Security : AI chips can be used in security fields such as video surveillance and facial recognition to improve the efficiency and accuracy of security monitoring.

3. Smart Home : AI chips can support the intelligent control and management of smart home devices, enhancing the living experience.

4. Healthcare : AI chips can be used in fields such as medical image analysis and disease diagnosis to assist doctors in providing precise treatment.

Current Status and Future Challenges of AI Chips in China

The domestic AI chip market has developed rapidly in recent years, giving rise to a number of innovative and competitive companies, including well-known ones such as Huawei, Cambricon, Horizon Robotics, and Baidu, as well as international companies like Nvidia. Below is a brief introduction to one chip from each of these companies:

Huawei HiSilicon's Ascend 910

Da Vinci Architecture

Architecture: Based on the Da Vinci architecture design

Manufacturing process: 7nm

Number of cores: Equipped with a large number of AICores (artificial intelligence cores), such as the 256 AICores mentioned above.

Performance metrics: Half-precision (FP16) computing power: up to 256 TeraFLOPS (trillion floating-point operations per second)

Integer precision (INT8) computing power: up to 512 TeraOPS (trillion integer operations per second)

Supports high-speed memory interfaces and channels, such as 128-channel full HD video encoding and decoding capabilities.

Maximum power consumption: approximately 350 watts

Cambrian Siyuan 370

MLU architecture

Architecture: MLUarch03

Computing power: Up to 256 TOPS (INT8), 64 TOPS (FP16)

Manufacturing process: 7nm

Performance metrics: Maximum computing power up to 256 TOPS (INT8 precision)

Number of integrated transistors: 39 billion

Memory support: Supports LPDDR5 memory

Application scenario: Suitable for cloud computing data centers

Maximum power consumption: 250W

Horizon 5

Horizon Architecture

Journey 5:

Architecture: Dual-core BPU: Horizon Robotics' self-developed second-generation Bayesian architecture, optimized for AI computing.

Computing power: The AI computing power of a single chip can reach up to 128 TOPS, which can handle a large number of parallel computing tasks.

Power consumption: 30W

Process: 16nm

Application scenarios: In-vehicle AI in autonomous driving, smart cockpits, and intelligent monitoring.

Baidu Kunlun Chip

Kunlun Architecture

Architecture: Baidu Kunlun 2 chip adopts the self-developed second-generation XPU architecture, which is an architecture design that has been deeply optimized for AI computing. It can efficiently execute large-scale parallel computing tasks and is particularly suitable for processing deep learning and machine learning algorithms.

Computing power: INT8 integer precision computing power reaches 256 TeraOPS (trillion integer operations per second).

The half-precision (FP16) computing power is 128 TeraFLOPS (trillion floating-point operations per second).

Power consumption: Maximum 120W

Process technology: 7nm.

Application scenarios: Baidu Kunlun 2 chip is suitable for AI computing needs in multiple scenarios such as cloud, edge, and device.

Nvidia H100

Nvidia H100SM

Architecture: Hopper architecture

Computing power: 67 TFLOPS for FP64;

FP32 has 989 TFLOPS;

FP16 has 1979 TFLOPS;

BF16 has 1979 TFLOPS;

INT8 has 3958 TFLOPS

Power consumption: 700W

Process: 4nm

Application scenarios: Machine learning, deep learning training and inference, scientific computing simulation, data analysis, natural language processing, etc.

It is evident that while domestic AI chips have achieved certain successes in design and application, a performance gap still exists compared to leading international companies like Nvidia. Domestic AI chips also face a series of key challenges:

1. Technological barriers and core patents : Chinese companies lag behind international leaders in high-end chip design, EDA tools, IP cores and advanced manufacturing processes, especially in advanced processes of 7nm and below, where they are highly dependent on foreign advanced technologies and equipment and also face the risk of sanctions.

2. Market competition and brand awareness: Although Huawei and other manufacturers have a significant influence in the domestic market, Nvidia, Intel, AMD and other companies still dominate the AI chip field in the international market. It will take time for Chinese companies to build brand influence and customer trust globally.

3. Talent Reserves and Training : The research and design of high-end AI chips requires a large number of professionals, covering a wide range of technical fields, including integrated circuit design, algorithm optimization, and materials science. China needs to further strengthen its talent training and recruitment efforts to support the long-term development of the industry.

With the continuous efforts and innovation of domestic enterprises, this gap is expected to gradually narrow in the future. At the same time, the government should increase its support for the AI chip industry to promote its rapid development in China.

The technical principles and architecture of AI chips

Read next

CATDOLL 102CM Li Anime Doll

CATDOLL 139CM Nonoka (TPE Body with Soft Silicone Head)

Research on Comprehensive Performance Testing System for Vehicle Clutch

CATDOLL 108CM Coco (TPE Body with Hard Silicone Head) (Dark Tan Tone)