
Broadcom CAST Performance Improvement Unlocks Efficiency in AI GPU Clusters
In high-stakes AI and high-performance computing environments, network congestion can silently erode performance. A recent joint study by Edgecore Networks and Broadcom reveals how Congestion Aware Sprayed Traffic (CAST) technology delivers up to 35.6% gains in collective communication benchmarks. This advancement, tested on AMD MI300X GPU clusters with Edgecore’s AIS800-64O 800G switches, offers data center operators a practical path to scalable AI infrastructure.
The findings highlight CAST’s ability to dynamically route traffic based on real-time round-trip time metrics, outperforming static load balancing. For C-suite leaders scaling distributed training and inference workloads, these results signal a shift toward more predictable network behavior. As AI models grow in complexity, such optimizations become essential for maintaining competitive edges.
The Challenge of Congestion in AI Networking
AI workloads demand massive data movement across GPU clusters, where even minor delays compound into hours of lost productivity. Traditional Ethernet fabrics often struggle under oversubscription, leading to packet loss and jitter that degrade collective operations like All-Reduce. Broadcom’s CAST addresses this by intelligently spraying traffic across multiple paths, prioritizing low-congestion routes.
This approach integrates seamlessly with existing RoCEv2 standards, avoiding the need for hardware overhauls. In practice, it reduces tail latency during peak loads, a common pain point in large-scale training runs. Data center architects can thus deploy it without disrupting current RDMA over Converged Ethernet setups.
Network observability plays a pivotal role here. The study leveraged SONiC-based telemetry from Broadcom’s Thor 2 400G NICs, capturing granular metrics on priority flow control (PFC) and data center quantized congestion notification (DCQCN). Such visibility ensures precise tuning, bridging the gap between theoretical gains and real-world deployment.
Benchmark Methodology and Cluster Configurations
Edgecore and Broadcom rigorously tested CAST across diverse topologies to mirror real data center variability. They focused on four core RCCL (ROCm Communication Collectives Library) operations central to AI/HPC: All-Reduce, All-Gather, Reduce-Scatter, and All-To-All. These primitives underpin distributed training frameworks like PyTorch and Horovod.
Configurations spanned oversubscribed (2:1), non-blocking (1:1), and undersubscribed (1:2) ratios, using clusters of AMD MI300X GPUs interconnected via Edgecore AIS800-64O switches and Thor 2 NICs. End-to-end tuning incorporated multi-rail RDMA fabrics, ensuring comprehensive validation. Performance diagrams detailing these setups are available on Edgecore’s press release page.
This methodology underscores a system-level approach, where switch, NIC, and software stack optimizations align for maximum throughput. For network engineers, it provides a blueprint for replicating gains in production environments.
Key Performance Highlights Across Topologies
CAST consistently outperformed baselines, with gains scaling to workload intensity:
- Oversubscribed (2:1): Up to 26.7% improvement, critical for cost-sensitive deployments where bandwidth exceeds port capacity.
- Non-blocking (1:1): Peak of 35.6% uplift, demonstrating full potential in balanced fabrics.
- Undersubscribed (1:2): Up to 29.8% enhancement, proving value even in high-bandwidth scenarios.
These metrics reflect sustained throughput under sustained loads, not just burst performance. Comparative analysis against standard sprayed traffic shows CAST’s edge in adaptive routing.

How CAST Optimizes Multi-Path Communication
At its core, CAST monitors real-time congestion via RTT feedback, dynamically adjusting flow distribution. Unlike equal-cost multi-path (ECMP) hashing, which can overload hot spots, CAST evens traffic intelligently. This results in lower queue depths and fewer PFC pauses, stabilizing RDMA flows.
In AI clusters, where All-To-All patterns dominate during model sharding, such precision prevents cascading failures. Broadcom’s integration with Tomahawk 5 switches ensures low-latency forwarding at 800G speeds. Operators benefit from rapid link failure recovery, minimizing downtime in fault-prone environments.
Edgecore’s open networking expertise amplified these outcomes. Running Enterprise SONiC on AIS800-64O platforms, the team achieved coordinated co-optimization across layers. This open ecosystem approach contrasts with proprietary stacks, offering flexibility for multi-vendor deployments.
For deeper technical insights, refer to Broadcom’s documentation on CAST or AMD’s RCCL overview.
Strategic Implications for AI Infrastructure Leaders
Data center decision-makers face mounting pressure to scale AI without proportional infrastructure hikes. CAST’s Broadcom CAST performance improvement directly tackles this by boosting effective bandwidth utilization. In a market projected to see AI networking spend exceed $10 billion by 2027 (per Dell’Oro Group estimates), such efficiencies translate to tangible ROI.
Consider a 1,000-GPU cluster running large language model training: a 30% speedup could shave weeks off timelines, accelerating time-to-insight. CFOs will appreciate the RoCEv2 compatibility, preserving investments in existing NICs and cables. CTOs gain a tool for hybrid cloud strategies, where predictable latency supports edge-to-core bursting.
Actionable Takeaways for Deployment
To leverage these gains, executives should prioritize:
- Telemetry Integration: Deploy SONiC with Thor 2 monitoring for PFC/DCQCN visibility.
- Topology Assessment: Start with 1:1 non-blocking tests to baseline CAST uplift.
- Multi-Rail RDMA: Tune fabrics for RCCL operations, targeting All-Reduce bottlenecks.
- Vendor Collaboration: Partner with open networking providers like Edgecore for end-to-end validation.
- Failure Recovery Drills: Simulate link downs to verify CAST’s rapid convergence.
These steps ensure measurable outcomes, backed by the study’s reproducible results.
Expert Perspectives on Congestion Mitigation
“Mitigating congestion is vital to AI networks,” notes Karen Schramm, VP of Architecture at Broadcom’s Data Center Solutions Group. She emphasizes CAST’s role in limiting bottlenecks while upholding RoCEv2 standards. This preserves ecosystem interoperability, a non-negotiable for heterogeneous clusters.
Nanda Ravindran, VP of Product Line Management at Edgecore Networks, adds: “As AI workloads scale, network predictability is critical. Integrating CAST with our 800G open solutions delivers congestion mitigation without compatibility trade-offs.” Their collaboration exemplifies how vendor partnerships drive innovation.
Such insights align with industry trends. Gartner’s 2025 forecast predicts 40% of AI data centers will adopt advanced congestion control by 2028, up from 15% today. CAST positions early adopters ahead of this curve.
Future Directions in Open AI Networking
This study reinforces open networking’s momentum in AI. Edgecore’s commitment to SONiC and high-speed Ethernet fabrics enables faster iteration than closed systems. Looking ahead, expect CAST enhancements for 1.6T switches and integration with emerging NVLink fabrics.
For HPC operators, the path forward involves hybrid metrics: blending throughput with energy efficiency. As MI300X successors arrive, CAST-like technologies will underpin exascale computing. Decision-makers should benchmark internally now to inform 2026 budgets.
About Edgecore Networks
Edgecore Networks Corporation is a wholly owned subsidiary of Accton Technology Corporation, the leading network ODM. Edgecore Networks delivers wired and wireless networking products and solutions through channel partners and system integrators worldwide for AI/ML, Cloud Data Center, Service Provider, Enterprise and SMB customers. Edgecore Networks is the leader in open networking providing a full line of open WiFi access points, packet transponders, virtual PON OLTs, cell site gateways, aggregation routers and 1G, 10G, 25G, 40G, 100G, 400G and 800G data center switches that offer choice of commercial and open-source NOS and SDN software.



