The AI200 and AI250 claim to offer rack-scale performance and superior memory capacity for fast generative AI inference at high performance per dollar per watt.
The AI200 introduces a purpose-built rack-level AI inference solution designed to deliver low TCO and optimised performance for LLM, LMM inference and other AI workloads.
It supports 768 GB of LPDDR per card for higher memory capacity and lower cost, enabling scale and flexibility forinference.
The AI250’s memory architecture is based on near-memory computing, improving efficiency and performance for AI inference workloads by delivering greater than 10x higher effective memory bandwidth and lower power consumption.
This enables disaggregated AI inferencing for efficient utilization of hardware while meeting customer performance and cost requirements.
Both rack solutions feature direct liquid cooling for thermal efficiency, PCIe for scale up, Ethernet for scale out, confidential computing for secure AI workloads, and a rack-level power consumption of 160 kW.
“ These innovative new AI infrastructure solutions empower customers to deploy generative AI at unprecedented TCO, while maintaining the flexibility and security modern data centers demand,” said Qualcomm svp Durga Malladi.
The hyperscaler-grade AI software stack, which spans end-to-end from the application layer to system software layer, is optimized for AI inference.
The stack supports leading machine learning (ML) frameworks, inference engines, generative AI frameworks, and LLM / LMM inference optimization techniques like disaggregated serving.
Developers benefit from seamless model onboarding and one-click deployment of Hugging Face models via Qualcomm Technologies’ Efficient Transformers Library and Qualcomm AI Inference Suite. Our software provides ready-to-use AI applications and agents, comprehensive tools, libraries, APIs, and services for operationalizing AI.
AI200 and AI250 are expected to be commercially available in 2026 and 2027 respectively.
