Three stages of development of AI big data models

It encompasses two meanings: "pre-training" and "large model". This means that after the model has been pre-trained on a large-scale dataset, it does not require fine-tuning or only requires fine-tuning with a small amount of data, and can directly support various applications.

AI big data models represent a milestone technology in the advancement of artificial intelligence towards general intelligence. The practical application of AI big data models has transformed the three essential elements of AI from "data, algorithms, and computing power" to "scenario, product, and computing power." Following the data-driven internet era and the computing power-driven cloud computing era, we are entering the AI era based on big data models.

The Development History of Large AI Models

► In terms of parameter scale, large AI models have gone through three stages: pre-trained models, large-scale pre-trained models, and ultra-large-scale pre-trained models. The parameter scale has increased by at least 10 times every year, and the number of parameters has broken through from hundreds of millions to trillions. Currently, large models with hundreds of billions of parameters have become the mainstream.

From a technical architecture perspective, the Transformer architecture is the mainstream algorithmic architecture foundation in the current field of large-scale models, leading to two main technical routes: GPT and BERT. BERT's most famous implementation is Google's AlphaGo. After the release of GPT 3.0, GPT gradually became the mainstream route for large-scale models. In summary, almost all large-scale language models with over a hundred billion parameters currently adopt the GPT model, such as Baidu's Wenxin Yiyan and Alibaba's Tongyi Qianwen.

► From the perspective of modality support, large AI models can be divided into large natural language processing models, large computer vision models, and large scientific computing models. Large AI models support more diverse modalities, gradually evolving from supporting single tasks in a single modality such as text, images, or speech to supporting multiple tasks in multiple modalities.

►From an application perspective, large models can be divided into two types: general-purpose large models and industry-specific large models. General-purpose large models have strong generalization capabilities and can complete multi-scenario tasks with little or no fine-tuning. This is equivalent to AI receiving "general education." ChatGPT and Huawei's Pangu are examples of general-purpose large models. Industry-specific large models, on the other hand, utilize industry knowledge to fine-tune the large model, allowing AI to receive "professional education" to meet the needs of different fields such as energy, finance, manufacturing, and media. Examples include BloombergGPT in the financial field and Baidu Wenxin in the aerospace field.

Currently, the development of large AI models is transitioning from being based on data from different modalities to combining with knowledge, interpretability, learning theories, and other aspects, presenting a new pattern of comprehensive efforts and multi-faceted development.

AI large-scale model development stage

The development of large-scale AI models has gone through three stages: the nascent stage, the consolidation stage, and the explosive growth stage.

►Emerging Stage (1950-2005): The stage represented by traditional neural network models such as CNN. Starting in 1956 with computer scientist John McCarthy's introduction of the concept of "artificial intelligence," AI development gradually evolved from being based on small-scale expert knowledge to being based on machine learning. In 1980, the prototype of the Convolutional Neural Network (CNN) was born. In 1998, the basic structure of the modern Convolutional Neural Network, LeNet-5, was born. Machine learning methods shifted from early models based on shallow machine learning to models based on deep learning, laying the foundation for in-depth research in fields such as natural language generation and computer vision. This had pioneering significance for the subsequent iteration of deep learning frameworks and the development of large-scale models.

►Consolidation Period (2006-2019): The stage of novel neural network models represented by Transformer. In 2013, the natural language processing model Word2Vec was born, first proposing a "word vector model" that converts words into vectors, enabling computers to better understand and process text data. In 2014, GAN (Generative Adversarial Network), hailed as one of the most powerful algorithmic models of the 21st century, was born, marking a new stage in deep learning's research into generative models. In 2017, Google revolutionized the neural network architecture based on self-attention mechanisms—the Transformer architecture—laying the foundation for large-scale model pre-training algorithmic architectures. In 2018, OpenAI and Google released the GPT-1 and BERT large models respectively, signifying that pre-trained large models became mainstream in the field of natural language processing. During the exploratory period, novel neural network architectures represented by Transformer laid the foundation for the algorithmic architecture of large models, significantly improving the performance of large-scale model technology.

►Breakthrough Period (2020-Present): The stage of pre-trained large models represented by GPT.

In 2020, OpenAI released GPT-3, a language model with 175 billion parameters, becoming the largest at the time and achieving significant performance improvements in zero-shot learning tasks. Subsequently, more strategies, such as Human Feedback-Based Reinforcement Learning (RHLF), code pre-training, and instruction fine-tuning, emerged to further enhance inference capabilities and task generalization. In November 2022, ChatGPT, powered by GPT-3.5, was released, quickly becoming a sensation online thanks to its realistic natural language interaction and multi-scene content generation capabilities. In March 2023, the latest ultra-large-scale multimodal pre-trained model, GPT-4, was released, possessing multimodal understanding and multi-type content generation capabilities. During this period of rapid development, the perfect combination of big data, high computing power, and advanced algorithms significantly improved the pre-training and generation capabilities of large models, as well as their multimodal and multi-scene application capabilities. The tremendous success of ChatGPT was achieved with the support of Microsoft Azure's powerful computing capabilities and massive amounts of data from wikis, and by adhering to a strategy of fine-tuning the GPT model and reinforcement learning (RLHF) based on human feedback on the Transformer architecture.

Overview of the development of domestic and foreign enterprises

Currently, competition among domestic and international giants in the field of large-scale models is fierce. OpenAI has become a benchmark company leading the development of large-scale models. Following the release of the multimodal large-scale model GPT-4, OpenAI is expected to release the more advanced ChatGPT-5 version in the fourth quarter of this year. Microsoft, through its investment and cooperation with OpenAI, has integrated its entire Office product line, launching Copilot Office in late March. On May 24, Microsoft announced that Windows 11 would support GPT-4.

Image source: CCID Consulting

On May 10th, Microsoft's direct competitor Google launched its next-generation large-scale model, PaLM 2. More than 25 AI products and features have been fully integrated into PaLM 2, including the existing chatbot Bard, the AI-powered office assistant Duet AI, and an AI-powered search engine. Meta, meanwhile, released its large-scale model, LLaMA, joining the competition. Amazon partnered with AI startup Hugging Face to develop BLOOM, a competitor to ChatGPT.

Domestically, all parties involved in industry, investment, and research have accelerated their deployment. Firstly, leading domestic technology companies have been releasing self-developed large-scale models in quick succession. Baidu released the Wenxin Yiyan large-scale model, Alibaba released its first ultra-large-scale language model, Tongyi Qianwen, and Tencent's Hunyuan AI large-scale model team launched the trillion-level Chinese NLP pre-trained model, HunYuan-NLP-1T. Huawei's Pengcheng Pangu large-scale model is the industry's first large-scale Chinese NLP model capable of generating and understanding hundreds of billions of data points.

Secondly, investors and entrepreneurs are actively entering the large-scale model competition. Meituan co-founder Wang Huiwen invested $50 million in AI large-scale models, former Sogou CEO Wang Xiaochuan and former Sogou COO Ru Liyun co-founded Baichuan Intelligence, Lanzhou Technology released its language generation model—Mencius MChat controllable large-scale model, and Westlake Xinchen also launched Xinchen Chat large-scale model.

Third, universities and research institutions are actively developing large-scale models. Fudan University launched MOSS, the first ChatGPT-like large-scale model in China; Tsinghua University's Knowledge Engineering Laboratory and its technology transfer company, Zhipu AI, released ChatGLM; the Institute of Automation of the Chinese Academy of Sciences launched the multimodal large-scale model Zidong Taichu; and IDEA Research Institute CCNL launched the open-source general-purpose large-scale model "Jiang Ziya".

Currently, large-scale models face four challenges.

First, evaluation and validation: Current evaluation datasets for large models are often academic datasets that are more like "toys." However, these academic datasets cannot fully reflect the diverse problems and challenges in the real world. Therefore, there is an urgent need to evaluate models on diverse and complex real-world problems using real-world datasets to ensure that the models can cope with real-world challenges.

Second, ethics and morality: The model should be consistent with human values and ensure that the model's behavior meets expectations. As a highly complex system, if such ethical issues are not handled seriously, it may brew a disaster for humanity.

Third, safety concerns: More effort needs to be put into improving the interpretability and supervision of the model. Safety should be an integral part of model development, not just an optional embellishment.

Fourth, development trends: Will the performance of the model continue to increase with the size of the model? This is a question that OpenAI probably cannot answer. Our understanding of the magical phenomena of large models is still very limited, and our insights into the principles of large models are still very valuable.

Three stages of development of AI big data models

Read next

CATDOLL Oksana Soft Silicone Head

CATDOLL 135CM Vivian (Customer Photos)

CATDOLL Nanako Hard Silicone Head

CATDOLL Nanako 109CM TPE (Soft Silicone Head with Pale Tone)