Sage Advice Hub

AI headphones driven by Apple M2 can translate multiple speakers at once

Google’s Pixel Buds wireless earbuds have offered a fantastic real-time translation facility for a while now. Over the past few years, brands such as Timkettle have offered similar earbuds for business customers. However, all these solutions can only handle one audio stream at once for translation.

The folks over at the University of Washington (UW) have developed something truly remarkable in the form of AI-driven headphones that can translate the voice of multiple speakers at once. Think of it as a polyglot in a crowded bar, able to understand the speech of people around him, speaking in different languages, all at once.

How does multi-speaker translation work?

“For the first time, we’ve preserved the sound of each person’s voice and the direction it’s coming from,” explains Gollakota, currently a professor at the institute’s Paul G. Allen School of Computer Science & Engineering.

The team likens their stack to a radar, as it kicks into action by identifying the number of speakers in the surroundings, and updating that number in real-time as people move in and out of the listening range. The whole approach works on-device and doesn’t involve sending user voice streams to a cloud server for translation. Yay, privacy!

In addition to speech translation, the kit also “maintains the expressive qualities and volume of each speaker’s voice.” Morever, directional and audio intensity adjustments are made as the speaker moves across the room. Interestingly, Apple is also said to be developing a system that allows the AirPods to translate audio in real-time.

How does it all come to life?

The UW team tested the AI headphones’ translation capabilities in nearly a dozen outdoor and indoor settings. As far as performance goes, the system can take, process, and produce translated audio within 2-4 seconds. Test participants appeared to prefer a delay worth 3-4 seconds, but the team is working to speed up the translation pipeline.

So far, the team has only tested Spanish, German, and French language translations, but they’re hopeful of adding more to the pool. Technically, they condensed blind source separation, localization, real-time expressive translation, and binaural rendering into a single flow, which is quite an impressive feat.

As far as the system goes, the team developed a speech translation model capable of running in real-time on an Apple M2 silicon, achieving real-time inference. Audio duties were handled by a pair of Sony’s noise-cancelling WH-1000XM4 headphones and a Sonic Presence SP15C binaural USB mic.

And here’s the best part. “The code for the proof-of-concept device is available for others to build on,” says the institution’s press release. That means the scientific and open-source tinkering community can learn and base more advanced projects on the foundations laid out by the UW team.

About Us

We are a comprehensive and trusted information platform dedicated to delivering high-quality content across a wide range of topics, including society, technology, business, health, culture, and entertainment.

From breaking news to in-depth reports, we adhere to the principles of accuracy and diverse perspectives, helping readers find clarity and reliability in today’s fast-paced information landscape.

Our goal is to be a dependable source of knowledge for every reader—making information not only accessible but truly trustworthy. Looking ahead, we will continue to enhance our content and services, connecting the world and delivering value.

Sage Advice Hub

AI headphones driven by Apple M2 can translate multiple speakers at once

How does multi-speaker translation work?

How does it all come to life?

Recommended Articles

It’s not your imagination — ChatGPT models actually do hallucinate more now

AI headphones driven by Apple M2 can translate multiple speakers at once

Apple is hoping your emails will fix its misfiring AI

Study says AI hype is hindering genuine research on artificial intelligence

Amazon’s next-gen Alexa+ assistant is here, with a few missing tricks

Motorola Edge 60 Fusion leak teases an extra camera and cool AI chops

Man who looked himself up on ChatGPT was told he ‘killed his children’

OpenAI showing a ‘very dangerous mentality’ regarding safety, expert warns

About Us