However, the rapid development of AI places higher demands on the infrastructure of cloud service providers and enterprise data centers. Data, as the key "fuel" for AI development, must be effectively collected, protected, and transmitted. Organizations exploring new AI applications must address these challenges. To support the massive amounts of data and computing resources required for AI, we need to build more efficient and reliable network infrastructure.
Against this backdrop, Ethernet technology, with its mature and extensive ecosystem, is becoming a crucial support for AI network infrastructure. Ethernet demonstrates strong potential to meet the high demands of AI and provide a unified platform, significantly impacting the economic viability of AI. It enables consistent operating models across various networks and clouds, avoiding the high costs associated with maintaining multiple infrastructures.
Key Requirements for the Development of AI Networks
Speed: The rapid growth of AI businesses has driven the demand for higher speeds in data centers and edge networks, propelling networks toward next-generation networks such as 400Gbit/s and even 800Gbit/s.
Privacy and security: Networks must process data efficiently while ensuring high-level encryption and security in multi-tenant environments to protect data privacy.
Edge inference: As enterprises deploy large language models (LLM) or small language models (SLM) and hybrid private AI clouds, the front-end deployment of inference capabilities will become a focus.
Short Job Completion Time (JCT) and Low Latency: Optimizing the network to provide lossless transmission and ensuring efficient bandwidth utilization through congestion management and load balancing are key to achieving fast JCT.
Flexible Clustering: In AI data centers, processor clusters can be configured in various topologies. Optimizing performance requires avoiding oversubscription between layers or regions to reduce JCT (Job Triggering Changes).
Multi-tenancy support: For security reasons, AI networks need to separate data streams.
Standardized architecture: AI networks typically consist of backend infrastructure (training) and frontend (inference). Ethernet's versatility allows for technology reuse between backend and frontend clusters.
Continuous innovation in Ethernet technology
Ethernet technology continues to innovate and evolve to meet the increasing demands of artificial intelligence on network scale. Some key technological advancements include:
Packet spraying: This technique allows each network flow to access all paths to its destination simultaneously. Flexible packet ordering fully utilizes all Ethernet links, achieving optimal load balancing, and only forces ordering when bandwidth-intensive operations in AI workloads require it.
Congestion Management: Ethernet-based congestion control algorithms are crucial for AI workloads. They prevent hotspots and distribute the load evenly across multiple paths, ensuring reliable transmission of AI traffic.
Unified and optimized enterprise infrastructure
Enterprises need to deploy a unified AI network infrastructure and operating model to reduce the cost of AI services and applications. Adopting standards-based Ethernet as the supporting technology is a core element. It ensures compatibility between front-end and back-end systems, avoiding standardization process barriers and economic impacts caused by different architectures. For example, Arista advocates establishing an "AI center" where GPUs are efficiently trained via a lossless network. Trained AI models are connected to AI inference clusters, allowing end users to easily query these models.
Ethernet's market advantages
Ethernet demonstrates strong competitiveness in AI deployments due to its openness, flexibility, and adaptability. Its performance surpasses InfiniBand, and its advantages will be further amplified with the strengthening of the Super Ethernet Alliance (UEC). Furthermore, Ethernet is more cost-effective, boasts a broader and more open ecosystem, provides a common, unified set of operations and skills for backend and frontend clusters, and offers opportunities for platform reuse across clusters. As AI use cases and services continue to expand, the opportunities for Ethernet infrastructure will increase significantly, whether at the core of hyperscale LLMs or at the enterprise edge. AI-ready Ethernet can meet the demands and deliver AI inference based on industry-specific proprietary data.
In conclusion, Ethernet technology plays a crucial role in AI network infrastructure. It can meet the diverse needs of AI in terms of speed, security, and edge inference. Through continuous technological innovation and broad ecosystem support, Ethernet provides enterprises with more efficient and economical solutions, promoting the widespread application and development of artificial intelligence.