Microsoft has announced the Maia 200, a new custom chip designed to scale AI inference more efficiently.
The company positions the chip as a core part of its long-term AI infrastructure strategy. This launch follows the Maia 100, released in 2023.Â
With Maia 200, Microsoft aims to run larger AI models faster while using less power. As a result, the company targets lower costs and greater stability across its AI systems.
Chip Design

Maia 200 is a clear technical upgrade, including more than 100 billion transistors. That increase allows it to handle heavier AI workloads with ease.
In terms of performance, Maia 200 delivers over 10 petaflops at 4-bit precision. It also provides about 5 petaflops at 8-bit precision.
These gains are a substantial improvement since Maia 100. In practical terms, this means faster responses and smoother performance.
Also, large models can run without strain, and future models will have room to grow.
Inference
AI training often receives the most attention, but inference drives the majority of real-world AI costs.
Inference is the process of running a trained model. Every chatbot reply and every AI search depends on it.
As AI tools expand, inference workloads increase sharply. Consequently, energy use and hardware demand rise as well. This has pushed companies to seek better optimization.
And a solution could be found in Microsoft’s Maia 200.
Power Use and Stability
Power efficiency now plays a critical role in AI operations. Data centers run continuously, and even small improvements can create large savings.
Maia 200 focuses on doing more work with less energy. This approach reduces heat and lowers cooling needs. It also improves system reliability.
Microsoft says a single Maia 200 node can run today’s largest AI models with ease. At the same time, it leaves room for future growth.
For large-scale deployments, this balance matters.
Custom Chips
Major technology companies now design their own AI chips to reduce reliance on Nvidia.
Nvidia’s GPUs remain essential for AI training and deployment, but the demand continues to exceed supply.
Google addressed this issue with its Tensor Processing Units. These chips power Google Cloud services. Amazon followed with Trainium, its in-house AI accelerator.
In December, Amazon released Trainium3. Like Google’s TPUs, Trainium helps offload work from Nvidia hardware.
With Maia, Microsoft enters this same competitive space.
Performance Claims
Microsoft has shared direct comparisons. According to the company, Maia 200 delivers three times the FP4 performance of third-generation Amazon Trainium chips.
Microsoft also states that Maia’s FP8 performance exceeds that of Google’s seventh-generation TPU.
These benchmarks place Maia 200 among the top AI inference chips available today. While results may vary by workload, the early indicators are strong.
For cloud customers, this performance can translate into faster services and lower latency.
Maia 200
Maia 200 is not limited to testing environments. Microsoft confirms the chip already supports internal AI workloads.
It powers models developed by the company’s Superintelligence team, and also helps run Copilot, Microsoft’s AI assistant.
This real-world use lets Microsoft refine the chip under production conditions. It also strengthens confidence in Maia’s readiness.
Community Access
Microsoft is also opening Maia 200 to external users. The company has invited developers, academic researchers, and frontier AI labs to use the Maia 200 software development kit.
This access promotes experimentation and feedback, and encourages adoption across different workloads.
For researchers and startups, Maia offers another option beyond traditional GPU-based systems.

