As AI systems become more capable, speech is fast becoming the default way we communicate with machines. French AI startup Mistral has jumped into the audio race with its first open model, aiming to challenge the dominance of walled-off corporate systems with open-weight alternatives.
On Tuesday, Mistral announced the release of Voxtral, its first family of audio models aimed at businesses.
The company is pitching Voxtral as the first open model that’s capable of deploying “truly usable speech intelligence in production.”
In other words, no longer will developers have to choose between a cheap, open system that fumbles transcriptions and doesn’t really understand what’s being said, and one that functions well, but is closed, leaving developers with a higher bill and less control over deployment.
For businesses, that means Voxtral offers an affordable alternative that the company claims is “less than half the price” of comparable solutions.
Image Credits:Mistral
Mistral says Voxtral can transcribe up to 30 minutes of audio. Due to its LLM backbone, Mistral Small 3.1, it can understand up to 40 minutes, allowing users to ask questions about the audio content, generate summaries, or turn voice commands into real-time actions like calling APIs or running functions. Voxtral is also multilingual, with the ability to transcribe and understand languages including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian.
The company is offering up two variants of its “speech understanding models.” The first, Voxtral Small, has 24 billion parameters for production-scale deployments, and is competitive with ElevenLabs Scribe, GPT-4o-mini, and Gemini 2.5 Flash.