• About
  • Advertise
  • Privacy & Policy
  • Contact
Sunday, January 4, 2026
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Deep Learning

Pushing the frontiers of audio generation

AiNEWS2025 by AiNEWS2025
2024-12-10
in Deep Learning
0
Pushing the frontiers of audio generation
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Applied sciences

Printed
30 October 2024
Authors

Zalán Borsos, Matt Sharifi and Marco Tagliasacchi

An illustration depicting speech patterns, iterative progress on dialogue generation,  and a relaxed conversation between two voices.

Our pioneering speech technology applied sciences are serving to individuals around the globe work together with extra pure, conversational and intuitive digital assistants and AI instruments.

Speech is central to human connection. It helps individuals around the globe trade info and concepts, categorical feelings and create mutual understanding. As our expertise constructed for producing pure, dynamic voices continues to enhance, we’re unlocking richer, extra participating digital experiences.

Over the previous few years, we’ve been pushing the frontiers of audio technology, growing fashions that may create prime quality, pure speech from a spread of inputs, like textual content, tempo controls and explicit voices. This expertise powers single-speaker audio in lots of Google merchandise and experiments — together with Gemini Live, Project Astra, Journey Voices and YouTube’s auto dubbing — and helps individuals around the globe work together with extra pure, conversational and intuitive digital assistants and AI instruments.

Working along with companions throughout Google, we just lately helped develop two new options that may generate long-form, multi-speaker dialogue for making advanced content material extra accessible:

  • NotebookLM Audio Overviews turns uploaded paperwork into participating and full of life dialogue. With one click on, two AI hosts summarize person materials, make connections between subjects and banter forwards and backwards.
  • Illuminate creates formal AI-generated discussions about analysis papers to assist make information extra accessible and digestible.

Right here, we offer an outline of our newest speech technology analysis underpinning all of those merchandise and experimental instruments.

Pioneering strategies for audio technology

For years, we have been investing in audio technology analysis and exploring new methods for producing extra pure dialogue in our merchandise and experimental instruments. In our earlier analysis on SoundStorm, we first demonstrated the power to generate 30-second segments of pure dialogue between a number of audio system.

This prolonged our earlier work, SoundStream and AudioLM, which allowed us to use many text-based language modeling strategies to the issue of audio technology.

SoundStream is a neural audio codec that effectively compresses and decompresses an audio enter, with out compromising its high quality. As a part of the coaching course of, SoundStream learns methods to map audio to a spread of acoustic tokens. These tokens seize the entire info wanted to reconstruct the audio with excessive constancy, together with properties akin to prosody and timbre.

AudioLM treats audio technology as a language modeling activity to supply the acoustic tokens of codecs like SoundStream. In consequence, the AudioLM framework makes no assumptions concerning the sort or make-up of the audio being generated, and might flexibly deal with quite a lot of sounds while not having architectural changes — making it candidate for modeling multi-speaker dialogues.

Instance of a multi-speaker dialogue generated by NotebookLM Audio Overview, based mostly on a number of potato-related paperwork.

Constructing upon this analysis, our newest speech technology expertise can produce 2 minutes of dialogue, with improved naturalness, speaker consistency and acoustic high quality, when given a script of dialogue and speaker flip markers. The mannequin additionally performs this activity in underneath 3 seconds on a single Tensor Processing Unit (TPU) v5e chip, in a single inference move. This implies it generates audio over 40-times quicker than actual time.

Scaling our audio technology fashions

Scaling our single-speaker technology fashions to multi-speaker fashions then grew to become a matter of information and mannequin capability. To assist our newest speech technology mannequin produce longer speech segments, we created an much more environment friendly speech codec for compressing audio right into a sequence of tokens, in as little as 600 bits per second, with out compromising the standard of its output.

The tokens produced by our codec have a hierarchical construction and are grouped by time frames. The primary tokens inside a bunch seize phonetic and prosodic info, whereas the final tokens encode superb acoustic particulars.

Even with our new speech codec, producing a 2-minute dialogue requires producing over 5000 tokens. To mannequin these lengthy sequences, we developed a specialised Transformer structure that may effectively deal with hierarchies of knowledge, matching the construction of our acoustic tokens.

With this system, we are able to effectively generate acoustic tokens that correspond to the dialogue, inside a single autoregressive inference move. As soon as generated, these tokens will be decoded again into an audio waveform utilizing our speech codec.

Animation displaying how our speech technology mannequin produces a stream of audio tokens autoregressively, that are decoded again to a waveform consisting of a two-speaker dialogue.

To show our mannequin methods to generate sensible exchanges between a number of audio system, we pretrained it on a whole bunch of hundreds of hours of speech information. Then we finetuned it on a a lot smaller dataset of dialogue with excessive acoustic high quality and exact speaker annotations, consisting of unscripted conversations from various voice actors and sensible disfluencies — the “umm”s and “aah”s of actual dialog. This step taught the mannequin methods to reliably swap between audio system throughout a generated dialogue and to output solely studio high quality audio with sensible pauses, tone and timing.

Consistent with our AI Principles and our dedication to growing and deploying AI applied sciences responsibly, we’re incorporating our SynthID expertise to watermark non-transient AI-generated audio content material from these fashions, to assist safeguard towards the potential misuse of this expertise.

New speech experiences forward

We’re now targeted on bettering our mannequin’s fluency, acoustic high quality and including extra fine-grained controls for options, like prosody, whereas exploring how greatest to mix these advances with different modalities, akin to video.

The potential purposes for superior speech technology are huge, particularly when mixed with our Gemini household of fashions. From enhancing studying experiences to creating content material extra universally accessible, we’re excited to proceed pushing the boundaries of what’s attainable with voice-based applied sciences.

Acknowledgements

Authors of this work: Zalán Borsos, Matt Sharifi, Brian McWilliams, Yunpeng Li, Damien Vincent, Félix de Chaumont Quitry, Martin Sundermeyer, Eugene Kharitonov, Alex Tudor, Victor Ungureanu, Sertan Girgin, Jonas Rothfuss, Jake Walker and Marco Tagliasacchi.

We thank Leland Rechis, Ralph Leith, Paul Middleton, Poly Pata, Minh Truong and RJ Skerry-Ryan for his or her important efforts on dialogue information.

We’re very grateful to our collaborators throughout Labs, Illuminate, Cloud, Speech and YouTube for his or her excellent work bringing these fashions into merchandise.

We additionally thank Françoise Beaufays, Krishna Bharat, Tom Hume, Simon Tokumine, James Zhao for his or her steerage on the challenge.

Source link

#Pushing #frontiers #audio #technology


Unlock the potential of cutting-edge AI options with our complete choices. As a number one supplier within the AI panorama, we harness the facility of synthetic intelligence to revolutionize industries. From machine studying and information analytics to pure language processing and laptop imaginative and prescient, our AI options are designed to reinforce effectivity and drive innovation. Discover the limitless prospects of AI-driven insights and automation that propel what you are promoting ahead. With a dedication to staying on the forefront of the quickly evolving AI market, we ship tailor-made options that meet your particular wants. Be part of us on the forefront of technological development, and let AI redefine the best way you use and reach a aggressive panorama. Embrace the long run with AI excellence, the place prospects are limitless, and competitors is surpassed.

Previous Post

How AI Startups Can Break Through 

Next Post

How Singapore’s SMBs and AI sustain a robust digital economy

AiNEWS2025

AiNEWS2025

Next Post
How Singapore’s SMBs and AI sustain a robust digital economy

How Singapore's SMBs and AI sustain a robust digital economy

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Optimizing Data Transfer in AI/ML Workloads

Optimizing Data Transfer in AI/ML Workloads

2026-01-04
Healthy 18-year-old welder nearly died of anthrax—the 9th such puzzling case

Healthy 18-year-old welder nearly died of anthrax—the 9th such puzzling case

2026-01-04
Surprise, surprise: Hollow Knight: Silksong is Steam’s Game of the Year

Surprise, surprise: Hollow Knight: Silksong is Steam’s Game of the Year

2026-01-04
You Will Never in 100 Years Guess What Elon Musk’s Lawyer Does as a Side Job

You Will Never in 100 Years Guess What Elon Musk’s Lawyer Does as a Side Job

2026-01-04

Recent News

Optimizing Data Transfer in AI/ML Workloads

Optimizing Data Transfer in AI/ML Workloads

2026-01-04
Healthy 18-year-old welder nearly died of anthrax—the 9th such puzzling case

Healthy 18-year-old welder nearly died of anthrax—the 9th such puzzling case

2026-01-04
Surprise, surprise: Hollow Knight: Silksong is Steam’s Game of the Year

Surprise, surprise: Hollow Knight: Silksong is Steam’s Game of the Year

2026-01-04
You Will Never in 100 Years Guess What Elon Musk’s Lawyer Does as a Side Job

You Will Never in 100 Years Guess What Elon Musk’s Lawyer Does as a Side Job

2026-01-04
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Optimizing Data Transfer in AI/ML Workloads

Optimizing Data Transfer in AI/ML Workloads

2026-01-04
Healthy 18-year-old welder nearly died of anthrax—the 9th such puzzling case

Healthy 18-year-old welder nearly died of anthrax—the 9th such puzzling case

2026-01-04
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.