...

People Can’t Distinguish AI Voice Clones From Actual Humans Anymore


The ability to synthesize realistic speech using AI has a host of applications, both benign and malicious. New research shows that today’s AI-generated voices are now indistinguishable from those of real humans.

AI’s ability to generate speech has improved dramatically in recent years. Many services are now capable of carrying out extended conversations. Typically, these tools can both clone the voices of real people and generate entirely synthetic voices.

This could make powerful AI capabilities far more accessible and raises the prospect of AI agents stepping into a range of customer-facing roles in the real world. But there are also fears these capabilities are powering an explosion of voice cloning scams, where bad actors use AI to impersonate family members or celebrities in an effort to manipulate victims.

Historically, synthesized speech has had a robotic quality that’s made it relatively easy to recognize, and even early AI-powered voice clones gave themselves away with their too-perfect cadence or occasional digital glitches. But a new study has found that the average listener can no longer distinguish between real human voices and deepfake clones made with consumer tools.

“The process required minimal expertise, only a few minutes of voice recordings, and almost no money,” Nadine Lavan at Queen Mary University of London, who led the research, said in a press release. “It just shows how accessible and sophisticated AI voice technology has become.” 

To test people’s ability to distinguish human voices from AI-generated ones, the researchers created 40 completely synthetic AI voices and 40 clones of human voices in a publicly available dataset. They used the AI voice generator tool from startup ElevenLabs, and each clone took roughly four minutes of voice recordings to create.

They then challenged 28 participants to rate how real the voices sounded on a scale and make a binary judgment about whether they were human or AI-generated. In results published in PLOS One, the authors found that although people could to some extent distinguish human voices from entirely synthetic ones, they couldn’t tell the difference between voice clones and real voices.

The study also sought to understand whether AI-generated voices had become “hyper-realistic.” Studies have shown that AI image generation has improved to such a degree that AI-generated pictures of faces are often judged as more human than photos of real people.

However, the researchers found the fully synthetic voices were judged less real than human recordings, while the clones roughly matched them. Still, participants reported the AI-generated voices seemed both more dominant and trustworthy than their human counterparts.

Lavan notes that the ability to create ultra-realistic artificial voices could have positive applications. “The ability to generate realistic voices at scale opens up exciting opportunities,” she said. “There might be applications for improved accessibility, education, and communication, where bespoke high-quality synthetic voices can enhance user experience.”

But the results add to a growing body of research suggesting AI voices are quickly becoming impossible to detect. And Lavan says this has many worrying ethical implications in areas like copyright infringement, the ability to spread misinformation, and fraud.

While many companies have attempted to put guardrails on their models designed to prevent misuse, the rapid proliferation of AI technology and the inventiveness of malicious actors suggests this is a problem that is only going to get worse.

Source link

#People #Distinguish #Voice #Clones #Actual #Humans #Anymore