Innovation > Innovation

SUBTITLES FOR THE WORLD

GOOGLE MARKETING, Los Angeles / GOOGLE / 2023

Awards:

Shortlisted Cannes Lions

CampaignCampaign(opens in a new tab)

Overview

Credits

Overview

Why is this work relevant for Innovation?

Language is the bridge that connects us. It’s how we express ourselves to others — how we feel, want, share, and love. Language barriers can be deeply frustrating, because they block human connection. Current applications of large language models and speech-to-text capabilities enable transcription and translation on handheld devices, but sacrifice personal connection by breaking eye contact. Eye contact is how we know we are seen, heard, and understood. By combining language capabilities with AR Glasses, we can seamlessly augment human connection with real-time subtitles, providing wearers with a private line-of-sight display, maintaining eye-contact while also being hands-free.

Background

Language is fundamental to connecting with the people around us. Yet, understanding someone who speaks a different language, or trying to follow a conversation if you are deaf or hard of hearing, can be a real challenge. We saw an opportunity for augmented reality (AR) to overcome these challenges. We therefore combined our advancements in translation technologies with hardware breakthroughs in headworn displays to deliver real-time subtitles in your line-of-sight on our AR glasses prototypes. We began testing these new experiences in the real world, and showcased initial reactions at the Google I/O Developers conference to share our progress and advance this technology into further product development. How we use computers and access knowledge continues to evolve and we believe that AR has significant potential as a new frontier in computing to not only augment our interactions with the world, but also to strengthen human relationships.

Describe the idea

Recent advances in language technologies have enabled new applications for speech translation on mobile devices. The ‘Wearable Subtitles’ concept was born when hearing accessibility researchers and AR researchers brainstormed more seamless interfaces for the deaf and hard-of-hearing community. Prototyping and studies showed how transcription on glasses could have a transformational impact through privately transcribed text, hands-free use, improved mobility, and socially acceptable interactions with maintained eye contact. Encouraged, the team expanded into translation, to more broadly help people understand spoken languages. The product team brought the innovation to everyday glasses, powered by breakthroughs in display technology, sound perception, speech recognition, and translation. Testing in lab and real-world settings confirmed the potential. After the Google I/O presentation, people around the world reached out to participate. The film and response gave us confidence to continue building helpful, everyday glasses that provide transformational value for anyone looking to overcome language barriers.

What were the key dates in the development process?

2018. Wireless wearable prototype developed. Wins first prize in internal perception hackathon.

2019. Expert interviews with the deaf and hard-of-hearing community. Pilot studies with deaf/hard-of-hearing individuals.

2020-10-20. “Wearable Subtitles” research paper published at ACM UIST 2020. Wins Best Demo Honorable Mention Award.

2020-10-13. Patent WO2022081141A1 filed. Distributed sound recognition using a wearable device.

2020-10-13. Patent application EP4004682A1 filed. Distributed sound recognition using a wearable device.

2020-09-29. Patent application WO2022071975A1 filed. Fitting mechanisms for eyewear with visible displays and accomodation of wearer anatomy.

2021. Expansion into translation and large language models for advanced semantics.

2022-05-11. Presented as the main feature at the Google I/O 2022 keynote.

Describe the innovation / technology

Humans have an innate ability to understand where sound is coming from, who is currently speaking and what they are saying, and what to pay attention to and what to ignore. AR glasses allow us to use multiple microphones around the frames to perceive the environment from the user’s point-of-view and perspective. This ability is a key differentiator over other form factors. Earbuds cannot provide the required baseline between multiple microphones, and mobile phone microphones listen without relationship to the user’s perspective.

We pick up speech from the world using first-person microphones on the glasses and run it through our audio perception pipeline to isolate specific speech and identify what needs to be translated. Our platform leverages advancements in AI to translate the speech using state-of-the-art speech-to-text, automatic language detection, and neural machine translation models. We apply advancements in large language models to semantically enhance the presentation in the interface through speech understanding. Finally, we render the text on a new form factor of everyday glasses with a private display, allowing us to create AR language experiences that display live captions for the world so users can stay hands-free, eyes-up, and focused on connecting with the people around them.

Describe the expectations / outcome

Our goal was to create meaningful value for millions of people globally who experience challenges accessing spoken language around them: people who are deaf and hard of hearing, expats, language learners, multilingual households, or travelers. To address this, we wanted to establish the usefulness of AR Glasses and provide an opportunity for more experiences to be built in this category. After debuting at I/O, we confirmed that language experiences on AR Glasses can transform the lives of many, and change how people interact with computing and access information. This led to launching public testing efforts and more investment in other AR Glasses experiences. This is relevant for the AR/VR industry, which has long been searching for scalable use. We hope to enable technology to fade into the background, so people can stay grounded in reality while still getting the information they need - in the context they need it.

Is there any cultural context that would help the jury understand how this work was perceived by people in the country where it ran?

Access to language is a growing problem world-wide. In the US the National Institutes of Health reported that 15% of adults have hearing loss that impacts daily life. This number may increase over time since hearing loss increases with age.

Tools, such as captioning, interpreters, or hearing-enabling devices, exist to help deaf and hard-of-hearing (D/HH) individuals. Captioning is desirable among D/HH watchers, helping viewers simultaneously follow dialogue and the action of a program. The US also has a shortage of sign language interpreters and qualified transcribers and the Bureau of Labor Statistics predicted a 46% increase in demand for interpreters from 2012 to 2022. Shortages exist in various settings, including schools, courts, hospitals, and business meetings. Transcription on AR glasses can support D/HH without drawing unwanted attention associated with hearing devices (e.g., fear of discrimination, robbery, or being unfashionable).

More Entries from Early Stage Technology in Innovation

24 items

Grand Prix Cannes Lions

Early Stage Technology

MOUTHPAD^

AUGMENTAL, WUNDERMAN THOMPSON

SUBTITLES FOR THE WORLD

Overview

MOUTHPAD^

SHELLMET

MAKING INACCESSIBLE ACCESSIBLE

LAY'S SMART FARM

PASSIVE COOKING

TRANSPARENCY CARD

SUBTITLES FOR THE WORLD

EQL BAND

HAPTA

THE FIRST DIGITAL NATION

BILL IT TO BEZOS

THE MAMMOTH MEATBALL

NEVER DONE EVOLVING FEAT SERENA

MOSQUITO VS MOSQUITO

CERTIFIED HUMAN

RESILIENCE ROAD

HEINZ TATTOO INK

ACCESS CODES

BAIGRAPHER

MOVE.AI IOS APPLICATION

DEEPOBJECTS.AI

MESSAGE WITH CARE

LOOK OUT BEATS

ESC1

SUBTITLES FOR THE WORLD

MAMA'S HUB, BUILT ON OPEN HEALTH STACK

MAMA'S HUB, BUILT ON OPEN HEALTH STACK

THE PYRAMIDS OF MEROË