Text-to-speech with feeling – this new AI model does everything but shed a tear

Collage of mouths, speech bubbles and grammar symbols on blue background — We Are/Getty Images

Not so long ago, generative AI could only communicate with human users via text. Now it’s increasingly being given the power of speech — and this ability is improving by the day.

On Thursday, AI voice platform ElevenLabs introduced v3, described on the company’s website as “the most expressive text-to-speech model ever.” The new model can exhibit a wide range of emotions and subtle communicative quirks — like sighs, laughter, and whispering — making its speech more humanlike than the company’s previous models.

Also: Could WWDC be Apple’s AI turning point? Here’s what analysts are predicting

In a demo shared on X, v3 was shown generating the voices of two characters, one male and the other female, who were having a lighthearted conversation about their newfound ability to speak in more humanlike voices.

Introducing Eleven v3 (alpha) – the most expressive Text to Speech model ever.
Supporting 70+ languages, multi-speaker dialogue, and audio tags such as [excited], [sighs], [laughing], and [whispers].
Now in public alpha and 80% off in June. pic.twitter.com/n56BersdUc

— ElevenLabs (@elevenlabsio) June 5, 2025

There’s certainly none of the Alexa-esque flatness of tone, but the v3-generated voices tend to be almost excessively animated, to the point that their laughter is more creepy than charming — take a listen yourself.

The model can also speak more than 70 languages, compared to its predecessor’s v2 limit of 29. It’s available now in public alpha, and its price tag has been slashed by 80% until the end of this month.

The future of AI interaction

AI-generated voice has become a major focus of innovation as tech developers look toward the future of human-machine interaction.

Automated assistants like Siri and Alexa have long been able to speak, of course, but as anyone who routinely uses these systems can attest, their voices are very mechanical, with a rather narrow range of emotional cadence and tones. They’re useful for handling quick and easy tasks, like playing a song or setting an alarm, but they don’t make great conversation partners.

Some of the latest text-to-speech (TTS) AI tools, on the other hand, have been engineered to speak in voices that are maximally realistic and engaging.

Also: You shouldn’t trust AI for therapy – here’s why

Users can prompt v3, for example, to speak in voices that are easily customizable through the use of “audio tags.” Think of these as stylistic filters that modify the output, and which can be inserted directly into text prompts: “Excited,” “Loudly,” “Sings,” “Laughing,” “Angry,” and so on.

ElevenLabs isn’t the only company racing to build more lifelike TTS models, which big tech companies are selling as a more intuitive and accessible way to interact with AI.

In late May, ElevenLabs competitor Hume AI unveiled its Empathic Voice Interface (EVI) 3 model, which allows users to generate custom voices by describing them in natural language. Similarly nuanced conversational abilities are also now on offer through Google’s Gemini 2.5 Pro Flash model.

Want more stories about AI? Sign up for Innovation, our weekly newsletter.

Source link

#Texttospeech #feeling #model #shed #tear

Text-to-speech with feeling – this new AI model does everything but shed a tear

The future of AI interaction

Recent Posts

Revolut achieves $75 billion valuation

Your Next ‘Large’ Language Model Might Not Be Large After All

Science-centric streaming service Curiosity Stream is an AI-licensing firm now

How to Get the Perfect Surround Sound Speaker Setup

DOGE is no more, and in its wake, only chaos

Game Theory Explains How Algorithms Can Drive Up Prices

Fire Breaks Out at UN Climate Summit

Europe Is Bending the Knee to the US on Tech Policy

This smart projector hangs the stars and sky in any room, and it’s less than $60 right now

Crunchbase Sector Snapshot: Transportation Dealmaking Decelerates