DriveNets a leader in cloud-native networking solutions announced the introduction of DriveNets Network Cloud-AI, an innovative artificial intelligence (AI) networking solution designed to maximize the utilization of AI infrastructures and improve the performance of large-scale AI workloads. Built on DriveNets’ Network Cloud – which is deployed in the world’s largest networks – Network Cloud-AI has been validated by leading hyperscalers in recent trials as the most cost-effective Ethernet solution for AI networking. With this new offering, DriveNets is well-positioned to address the growing AI networking segment – a $10B market opportunity.
With the fast growth of AI workloads, network solutions that are used in the fabric of AI clusters need to evolve to maximize the utilization of costly AI compute resources. Simply put, AI workloads perform most effectively when the network is able to operate at 100% utilization.
“AI compute resources are extremely costly and must be fully utilized to avoid ‘idle cycles’ as they await networking tasks,” said Ido Susan, DriveNets co-founder and CEO. “Leveraging our experience supporting the world’s largest networks, we have developed DriveNets Network Cloud-AI. Network Cloud-AI has already achieved up to a 30% reduction in idle time in recent trials, enabling exponentially higher AI throughput compared to a standard Ethernet solution. This reduction also means the network effectively ‘pays for itself’ through more efficient use of AI resources.”
CIO INFLUENCE: HP Chooses RISE with SAP to Help Drive Digital Transformation, Optimization and Efficiency
Until now, AI networks were based on either traditional Ethernet leaf-and-spine architecture that was not designed to support high-performance AI workloads at scale, or with proprietary solutions such as Nvidia’s InfiniBand that do not support network interoperability and offer little flexibility for hyperscalers looking to avoid “vendor lock-in.” DriveNets Network Cloud-AI offers the best of both worlds. It supports 30% improvement in JCT (Job Completion Time) of large-scale AI workloads, substantially improving resource utilization, while also supporting a standard Ethernet which allows for vendor interoperability and choice.
“Network Cloud-AI provides balanced fabric connectivity between all GPUs in a cluster just as InfiniBand does,” said Susan. “The difference is that Network Cloud-AI interfaces with servers on standard Ethernet. InfiniBand uses proprietary equipment which creates vendor lock on the networking and GPU level.”
A Distributed Networking Model
DriveNets Network Cloud-AI is based on OCP’s Distributed Disaggregated Chassis (DDC) architecture which is built on a distributed leaf-and-spine model designed to support service provider high-scale networks. This architecture is now proven to be the best AI networking solution, offering the following benefits:
- Highest scale – connects up to 32,000 GPUs at speeds ranging from 100G to 800G to a single AI cluster with perfect load balancing
- Maximum utilization – equally distributes traffic across the AI network fabric, ensuring maximum network utilization and zero packet loss under the highest loads
- Shortest JCTÂ – supports congestion-free operations through end-to-end traffic scheduling, avoids flow collisions and jitter, and provides zero-impact failover with sub-10ms automatic path convergence
- Openness – is an Ethernet-based solution that avoids proprietary approaches and supports vendor interoperability with a variety of white box manufacturers (ODMs), Network Interface Cards (NICs), and AI accelerator ASICs
CIO INFLUENCE: Organizations are Advancing their Digital Strategies with AI
Substantial Cost Savings and Scaling to Support a Growing Market
DriveNets Network Cloud-AI is the highest-scale DDC implementation in the market today. Early trials by leading hyperscalers using Network Cloud-AI over white boxes with Broadcom’s Jericho family chipset achieved up to 30% improvement in JCT compared to other Ethernet solutions. This improvement can result in up to 10% reduction in the entire AI cluster cost.
“Ethernet has proven time and again to be the best choice for all networking needs by enabling an open, healthy, and competitive ecosystem,” said Ram Velaga, senior vice president and general manager, Core Switching Group, Broadcom. “Large-scale training and inference of AI models will benefit from networks that can perform at 100% utilization like DriveNets Network Cloud-AI. Broadcom’s Jericho3-AI delivers an Ethernet network with perfect load balancing and end-to-end congestion management, resulting in significant reduction in job completion time compared to any other alternative.”
Industry analyst firm 650 Group has forecasted the AI cluster connectivity market to grow from $2B in 2022 to more than $10B in 2027, with Ethernet representing the vast majority of the market.
“DriveNets is an innovator that disrupted the traditional high-scale networking market, showing that a disaggregated, white-box-based solution can deliver greater network scale at a lower cost,” said Alan Weckel, founder and technology Analyst at 650 Group. “They are now ready to do it again in AI networking, participating in early high-scale AI trials and building on the experience they acquired by building the largest core network in the world.”
CIO INFLUENCE: Datadog Releases Data Streams Monitoring to Assess Streaming Data Pipeline Performance
[To share your insights with us, please write to sghosh@martechseries.com]