ElevenLabs CEO Says AI Audio Models Will Become Commodities

Updated:October 30, 2025

Reading Time: 3 minutes
A robot and sound waves

During his appearance at TechCrunch Disrupt 2025, Mati Staniszewski, the co-founder and CEO of ElevenLabs, shared a thought-provoking prediction about AI audio technology. 

He believes that AI models, especially those powering audio, will eventually be commoditized.

It’s a surprising statement from the leader of a company best known for developing those very models. 

Standardization

Staniszewski explained that his company’s researchers have made significant progress in solving complex model architecture challenges within audio AI. 

That progress, he said, will continue for the next year or two as the company focuses on refining its models.

However, he also noted that the competitive advantage these models offer will not last forever.

“Over the long term, it will commoditize, over the next couple of years,” Staniszewski said. 

“Even if there are differences, which I think will be the truth for some voices, some languages, on its own, the differences will be smaller.”

In other words, the technology gap between companies will narrow. The distinctions between voice models will shrink, making the tools themselves more uniform and accessible.

Why Keep Building What Will Become Common?

A moderator asked Staniszewski why ElevenLabs continues to invest heavily in model development if he believes that they will eventually become standard technology. 

His answer was simple: timing still matters. In the short term, he said, model development remains the most important competitive edge.

“They’re still the biggest advantage and the biggest step change you can have today,” he explained.

Poor audio quality, he added, is still a problem across the industry. “If the AI voices or interactions don’t sound good, that’s still a problem that needs to be solved,” he said.

The solution, in his view, lies in building high-performing models internally. 

“The only way to solve it is… building the models yourself, and then, over the long term, there will be other players that will solve that, too.”

This approach puts ElevenLabs in a temporary but valuable lead, solving key audio challenges before the field becomes saturated.

Not a Fit All

Mati Staniszewski, CEO of ElevenLabss
Image Credits: Jeff Spicer/Getty Images

Staniszewski also pointed out that companies searching for reliable and scalable use cases will continue to use different models for different needs. 

Not every application will require the same architecture or output. Some models, for example, may perform better in conversational AI, while others might excel in dubbing, localization, or content creation. 

This diversity will keep the market dynamic, even as individual models become more similar.

Multi-Modal AI

While voice remains the foundation of ElevenLabs’ work, Staniszewski said the next trend will be toward multi-modal or fused systems, AI models that generate or process multiple media types at once.

“So, you will create audio and video at the same time, or audio and LLMs at the same time in a conversational setting,” he said.

He referenced Google’s Veo 3 as a strong example of what can be achieved when audio and video models are combined. 

Such integrations, he said, will define the next phase of AI development, creating richer, more interactive user experiences.

Partnerships and Collaboration

Looking ahead, Staniszewski said ElevenLabs plans to form new partnerships and work with open-source technologies. 

The company hopes to merge its deep expertise in audio with the strengths of other leading AI systems. 

By collaborating rather than competing, ElevenLabs aims to accelerate innovation and expand its impact. 

The company sees value in sharing knowledge and combining capabilities across the AI landscape. 

Long-Term Value

For ElevenLabs, the long-term focus extends beyond model-building. The goal, Staniszewski explained, is to connect technical development with practical application.

He compared this strategy to Apple’s famous integration of software and hardware, which created a seamless user experience and a loyal customer base.

“The same way software and hardware were the magic for Apple,” he said, “we think the product and AI will be the magic for the generation of the best use cases.”

That analogy captures ElevenLabs’ vision: the real value lies not in the model alone but in how it’s used to create meaningful, high-quality products.

The Future

Staniszewski’s remarks offer insight into where the industry is heading. Over the next few years, AI audio models may lose their exclusivity, but their role will remain essential.

As the technology becomes more standardized, the competitive edge will shift toward integration, creativity, and execution. 

Companies that can blend AI audio with other forms of intelligence, visual, linguistic, or interactive, will likely lead the next wave of innovation.

For ElevenLabs, that means focusing on two parallel tracks:

  • Advancing the core model architecture to improve realism and scalability.
  • Building applications that make voice AI more accessible, useful, and human-like.

If Staniszewski is right, the next two years will mark a turning point in voice AI. It could mean faster access to high-quality models, lower costs, and better options. 

It could also mean more natural-sounding, interactive voices across apps, games, and devices.

Lolade

Contributor & AI Expert