• About
  • Advertise
  • Privacy & Policy
  • Contact
Sunday, January 11, 2026
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

Audio Spectrogram Transformers Beyond the Lab

AiNEWS2025 by AiNEWS2025
2025-06-11
in Machine Learning
0
Audio Spectrogram Transformers Beyond the Lab
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Want to know what draws me to soundscape analysis?

It’s a field that combines science, creativity, and exploration in a way few others do. First of all, your laboratory is wherever your feet take you — a forest trail, a city park, or a remote mountain path can all become spaces for scientific discovery and acoustic investigation. Secondly, monitoring a chosen geographic area is all about creativity. Innovation is at the heart of environmental audio research, whether it’s rigging up a custom device, hiding sensors in tree canopies, or using solar power for off-grid setups. Finally, the sheer volume of data is truly incredible, and as we know, in spatial analysis, all methods are fair game. From hours of animal calls to the subtle hum of urban machinery, the acoustic data collected can be vast and complex, and that opens the door to using everything from deep learning to geographical information systems (GIS) in making sense of it all.

After my earlier adventures with soundscape analysis of one of Poland’s rivers, I decided to raise the bar and design and implement a solution capable of analysing soundscapes in real time. In this blog post, you’ll find a description of the proposed method, along with some code that powers the entire process, mainly using an Audio Spectrogram Transformer (AST) for sound classification.

Device prototype
Outdoor/Urban version of the sensor prototype (image by author)

Methods

Setup

There are many reasons why, in this particular case, I chose to use a combination of Raspberry Pi 4 and AudioMoth. Believe me, I tested a wide range of devices — from less power-hungry models of the Raspberry Pi family, through various Arduino versions, including the Portenta, all the way to the Jetson Nano. And that was just the beginning. Choosing the right microphone turned out to be even more complicated.

Ultimately, I went with the Pi 4 B (4GB RAM) because of its solid performance and relatively low power consumption (~700mAh when running my code). Additionally, pairing it with the AudioMoth in USB microphone mode gave me a lot of flexibility during prototyping. AudioMoth is a powerful device with a wealth of configuration options, e.g. sampling rate from 8 kHz to stunning 384 kHz. I have a strong feeling that — in the long run — this will prove to be a perfect choice for my soundscape studies.

AudioMoth USB Microphone configuration app. Remember about flashing the device with the proper firmware before configuring.

Capturing sound

Capturing audio from a USB microphone using Python turned out to be surprisingly troublesome. After struggling with various libraries for a while, I decided to fall back on the good old Linux arecord. The whole sound capture mechanism is encapsulated with the following command:

arecord -d 1 -D plughw:0,7 -f S16_LE -r 16000 -c 1 -q /tmp/audio.wav

I’m deliberately using a plug-in device to enable automatic conversion in case I would like to introduce any changes to the USB microphone configuration. AST is run on 16 kHz samples, so the recording and AudioMoth sampling are set to this value.

Pay attention to the generator in the code. It’s important that the device continuously captures audio at the time intervals I specify. I aimed to store only the most recent audio sample on the device and discard it after the classification. This approach will be especially useful later during larger-scale studies in urban areas, as it helps ensure people’s privacy and aligns with GDPR compliance.

import asyncio
import re
import subprocess
from tempfile import TemporaryDirectory
from typing import Any, AsyncGenerator

import librosa
import numpy as np


class AudioDevice:
    def __init__(
        self,
        name: str,
        channels: int,
        sampling_rate: int,
        format: str,
    ):
        self.name = self._match_device(name)
        self.channels = channels
        self.sampling_rate = sampling_rate
        self.format = format

    @staticmethod
    def _match_device(name: str):
        lines = subprocess.check_output(['arecord', '-l'], text=True).splitlines()
        devices = [
            f'plughw:{m.group(1)},{m.group(2)}'
            for line in lines
            if name.lower() in line.lower()
            if (m := re.search(r'card (\d+):.*device (\d+):', line))
        ]

        if len(devices) == 0:
            raise ValueError(f'No devices found matching `{name}`')
        if len(devices) > 1:
            raise ValueError(f'Multiple devices found matching `{name}` -> {devices}')
        return devices[0]

    async def continuous_capture(
        self,
        sample_duration: int = 1,
        capture_delay: int = 0,
    ) -> AsyncGenerator[np.ndarray, Any]:
        with TemporaryDirectory() as temp_dir:
            temp_file = f'{temp_dir}/audio.wav'
            command = (
                f'arecord '
                f'-d {sample_duration} '
                f'-D {self.name} '
                f'-f {self.format} '
                f'-r {self.sampling_rate} '
                f'-c {self.channels} '
                f'-q '
                f'{temp_file}'
            )

            while True:
                subprocess.check_call(command, shell=True)
                data, sr = librosa.load(
                    temp_file,
                    sr=self.sampling_rate,
                )
                await asyncio.sleep(capture_delay)
                yield data

Classification

Now for the most exciting part.

Using the Audio Spectrogram Transformer (AST) and the excellent HuggingFace ecosystem, we can efficiently analyse audio and classify detected segments into over 500 categories.
Note that I’ve prepared the system to support various pre-trained models. By default, I use MIT/ast-finetuned-audioset-10–10–0.4593, as it delivers the best results and runs well on the Raspberry Pi 4. However, onnx-community/ast-finetuned-audioset-10–10–0.4593-ONNX is also worth exploring — especially its quantised version, which requires less memory and serves the inference results quicker.

You may notice that I’m not limiting the model to a single classification label, and that’s intentional. Instead of assuming that only one sound source is present at any given time, I apply a sigmoid function to the model’s logits to obtain independent probabilities for each class. This allows the model to express confidence in multiple labels simultaneously, which is crucial for real-world soundscapes where overlapping sources — like birds, wind, and distant traffic — often occur together. Taking the top five results ensures that the system captures the most likely sound events in the sample without forcing a winner-takes-all decision.

from pathlib import Path
from typing import Optional

import numpy as np
import pandas as pd
import torch
from optimum.onnxruntime import ORTModelForAudioClassification
from transformers import AutoFeatureExtractor, ASTForAudioClassification


class AudioClassifier:
    def __init__(self, pretrained_ast: str, pretrained_ast_file_name: Optional[str] = None):
        if pretrained_ast_file_name and Path(pretrained_ast_file_name).suffix == '.onnx':
            self.model = ORTModelForAudioClassification.from_pretrained(
                pretrained_ast,
                subfolder='onnx',
                file_name=pretrained_ast_file_name,
            )
            self.feature_extractor = AutoFeatureExtractor.from_pretrained(
                pretrained_ast,
                file_name=pretrained_ast_file_name,
            )
        else:
            self.model = ASTForAudioClassification.from_pretrained(pretrained_ast)
            self.feature_extractor = AutoFeatureExtractor.from_pretrained(pretrained_ast)

        self.sampling_rate = self.feature_extractor.sampling_rate

    async def predict(
        self,
        audio: np.array,
        top_k: int = 5,
    ) -> pd.DataFrame:
        with torch.no_grad():
            inputs = self.feature_extractor(
                audio,
                sampling_rate=self.sampling_rate,
                return_tensors='pt',
            )
            logits = self.model(**inputs).logits[0]
            proba = torch.sigmoid(logits)
            top_k_indices = torch.argsort(proba)[-top_k:].flip(dims=(0,)).tolist()

            return pd.DataFrame(
                {
                    'label': [self.model.config.id2label[i] for i in top_k_indices],
                    'score': proba[top_k_indices],
                }
            )

To run the ONNX version of the model, you need to add Optimum to your dependencies.

Sound pressure level

Along with the audio classification, I capture information on sound pressure level. This approach not only identifies what made the sound but also gains insight into how strongly each sound was present. In that way, the model captures a richer, more realistic representation of the acoustic scene and can eventually be used to detect finer-grained noise pollution information.

import numpy as np
from maad.spl import wav2dBSPL
from maad.util import mean_dB


async def calculate_sound_pressure_level(audio: np.ndarray, gain=10 + 15, sensitivity=-18) -> np.ndarray:
    x = wav2dBSPL(audio, gain=gain, sensitivity=sensitivity, Vadc=1.25)
    return mean_dB(x, axis=0)

The gain (preamp + amp), sensitivity (dB/V), and Vadc (V) are set primarily for AudioMoth and confirmed experimentally. If you are using a different device, you must identify these values by referring to the technical specification.

Storage

Data from each sensor is synchronised with a PostgreSQL database every 30 seconds. The current urban soundscape monitor prototype uses an Ethernet connection; therefore, I am not restricted in terms of network load. The device for more remote areas will synchronise the data each hour using a GSM connection.

label           score        device   sync_id                                sync_time
Hum             0.43894055   yor      9531b89a-4b38-4a43-946b-43ae2f704961   2025-05-26 14:57:49.104271
Mains hum       0.3894045    yor      9531b89a-4b38-4a43-946b-43ae2f704961   2025-05-26 14:57:49.104271
Static          0.06389702   yor      9531b89a-4b38-4a43-946b-43ae2f704961   2025-05-26 14:57:49.104271
Buzz            0.047603738  yor      9531b89a-4b38-4a43-946b-43ae2f704961   2025-05-26 14:57:49.104271
White noise     0.03204195   yor      9531b89a-4b38-4a43-946b-43ae2f704961   2025-05-26 14:57:49.104271
Bee, wasp, etc. 0.40881288   yor      8477e05c-0b52-41b2-b5e9-727a01b9ec87   2025-05-26 14:58:40.641071
Fly, housefly   0.38868183   yor      8477e05c-0b52-41b2-b5e9-727a01b9ec87   2025-05-26 14:58:40.641071
Insect          0.35616025   yor      8477e05c-0b52-41b2-b5e9-727a01b9ec87   2025-05-26 14:58:40.641071
Speech          0.23579548   yor      8477e05c-0b52-41b2-b5e9-727a01b9ec87   2025-05-26 14:58:40.641071
Buzz            0.105577625  yor      8477e05c-0b52-41b2-b5e9-727a01b9ec87   2025-05-26 14:58:40.641071

Results

A separate application, built using Streamlit and Plotly, accesses this data. Currently, it displays information about the device’s location, temporal SPL (sound pressure level), identified sound classes, and a range of acoustic indices.

Dashboard
Streamit analytical dashboard (image by author)

And now we are good to go. The plan is to extend the sensor network and reach around 20 devices scattered around multiple places in my city. More information about a larger area sensor deployment will be available soon.

Moreover, I’m collecting data from a deployed sensor and plan to share the data package, dashboard, and analysis in an upcoming blog post. I’ll use an interesting approach that warrants a deeper dive into audio classification. The main idea is to match different sound pressure levels to the detected audio classes. I hope to find a better way of describing noise pollution. So stay tuned for a more detailed breakdown soon.

In the meantime, you can read the preliminary paper on my soundscapes studies (headphones are obligatory).


This post was proofread and edited using Grammarly to improve grammar and clarity.

Source link

#Audio #Spectrogram #Transformers #Lab

Tags: Audio Analysisdeep learningEditors PickGeoinformaticsRaspberry Pisoundscape
Previous Post

False claims that ivermectin treats cancer, COVID leads states to pass OTC laws

Next Post

Q&A: Meeting the Demands of High-Performance Research Computing in Higher Ed

AiNEWS2025

AiNEWS2025

Next Post
Q&A: Meeting the Demands of High-Performance Research Computing in Higher Ed

Q&A: Meeting the Demands of High-Performance Research Computing in Higher Ed

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Can One AI Platform Replace Your Creative Tool Stack?

Can One AI Platform Replace Your Creative Tool Stack?

2026-01-10
Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

2026-01-10
Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

2026-01-10
Elon Musk says he’s going to open-source the new X algorithm next week

Elon Musk says he’s going to open-source the new X algorithm next week

2026-01-10

Recent News

Can One AI Platform Replace Your Creative Tool Stack?

Can One AI Platform Replace Your Creative Tool Stack?

2026-01-10
Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

2026-01-10
Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

2026-01-10
Elon Musk says he’s going to open-source the new X algorithm next week

Elon Musk says he’s going to open-source the new X algorithm next week

2026-01-10
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Can One AI Platform Replace Your Creative Tool Stack?

Can One AI Platform Replace Your Creative Tool Stack?

2026-01-10
Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

2026-01-10
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.