New Microsoft AI chip no threat to Nvidia, but growing LLM needs drive custom silicon

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Microsoft has been developing a new artificial intelligence (AI) chip, internally code-named Athena, since as early as 2019, according to reporting from The Information today. The company could make Athena widely available for use within the company itself and OpenAI as early as next year.

Experts say Nvidia won't be threatened by these moves — but it does signal the need for hyperscalers to develop their own custom silicon.

The chip, like those developed in-house by Google (TPU) and Amazon (Trainium and Inferentia processor architectures), is designed to handle large language model (LLM) training. That is essential because the scale of advanced generative AI models is growing faster than the compute capabilities needed to train them, Gartner analyst Chirag Dekate told VentureBeat by email.

Nvidia is the market leader by a mile when it comes to supplying AI chips, with about 88% market share, according to John Peddie Research. Companies are vying just to reserve access to the high-end A100 and H100 GPUs that cost tens of thousands of dollars each — causing what could be described as a GPU crisis.

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

"Leading-edge generative AI models are now using hundreds of billions of parameters requiring exascale computational capabilities," he explained. "With next-generation models ranging in trillions of parameters, it is no surprise that leading technology innovators are exploring diverse computational accelerators to accelerate training while reducing the time and cost of training involved."

As Microsoft seeks to accelerate its generative AI strategy while cutting costs, it makes sense that the company develop a differentiated custom AI accelerator strategy, he added, which "could help them deliver disruptive economies of scale beyond what is possible using traditional commoditized technology approaches."

The need for acceleration also applies, importantly, to AI chips that support machine learning inference — that is, when a model is boiled down to a set of weights that then use live data to produce actionable results. Compute infrastructure is used for inference every time ChatGPT generates responses to natural language inputs, for example.

Nvidia produces very powerful, general-purpose AI chips and offers its parallel computing platform CUDA (and it derivatives) as a way to do ML training specifically, said analyst Jack Gold, of J Gold Associates, in an email to VentureBeat. But inference generally requires less performance, he explained, and the hyperscalers see a way to also impact the inference needs of their customers with customized silicon.

"Inference ultimately will be a much larger market than ML, so it's important for all of the vendors to offer products here," he said.

Gold said he doesn't see Microsoft's Athena as much of a threat to Nvidia's place in AI/ML, where it has dominated since the company helped power the deep learning "revolution" of a decade ago; built a powerhouse platform strategy and software-focused approach; and seen its stock rise in an era of GPU-heavy generative AI.

"As needs expand and diversity of use expands as well, it's important for Microsoft and the other hyperscalers to pursue their own optimized versions of AI chips for their own architectures and optimized algorithms (not CUDA-specific)," he said.

It's about cloud operating costs, he explained, but also about providing lower-cost options for diverse customers who may not need or want the high-cost Nvidia option. "I expect all of the hyperscalers to continue to develop their own silicon, not just to compete with Nvidia, but also with Intel in general-purpose cloud compute."

Dekate also maintained that Nvidia shows no signs of slowing down. "Nvidia continues to be the primary GPU technology driving extreme-scale generative AI development and engineering," he said. "Enterprises should expect Nvidia to continue building on its leadership-class innovation and drive competitive differentiation as custom AI ASICs emerge."

But he pointed out that "innovation in the last phase of Moore's law will be driven by heterogenous acceleration comprising GPUs and application-specific custom chips." This has implications for the broader semiconductor industry, he explained, especially "technology providers that have yet to meaningfully engage in addressing the needs of the rapidly evolving AI market."

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

VentureBeat's mission