Cannes Lions

Subtitles for the world

GOOGLE MARKETING, Los Angeles / GOOGLE / 2023

Presentation Image
Case Film
Supporting Content

Overview

Entries

Credits

Overview

Background

Language is fundamental to connecting with the people around us. Yet, understanding someone who speaks a different language, or trying to follow a conversation if you are deaf or hard of hearing, can be a real challenge. We saw an opportunity for augmented reality (AR) to overcome these challenges. We therefore combined our advancements in translation technologies with hardware breakthroughs in headworn displays to deliver real-time subtitles in your line-of-sight on our AR glasses prototypes. We began testing these new experiences in the real world, and showcased initial reactions at the Google I/O Developers conference to share our progress and advance this technology into further product development. How we use computers and access knowledge continues to evolve and we believe that AR has significant potential as a new frontier in computing to not only augment our interactions with the world, but also to strengthen human relationships.

Idea

Recent advances in language technologies have enabled new applications for speech translation on mobile devices. The ‘Wearable Subtitles’ concept was born when hearing accessibility researchers and AR researchers brainstormed more seamless interfaces for the deaf and hard-of-hearing community. Prototyping and studies showed how transcription on glasses could have a transformational impact through privately transcribed text, hands-free use, improved mobility, and socially acceptable interactions with maintained eye contact. Encouraged, the team expanded into translation, to more broadly help people understand spoken languages. The product team brought the innovation to everyday glasses, powered by breakthroughs in display technology, sound perception, speech recognition, and translation. Testing in lab and real-world settings confirmed the potential. After the Google I/O presentation, people around the world reached out to participate. The film and response gave us confidence to continue building helpful, everyday glasses that provide transformational value for anyone looking to overcome language barriers.

Strategy

Our goal was to create meaningful value for millions of people globally who experience challenges accessing spoken language around them: people who are deaf and hard of hearing, expats, language learners, multilingual households, or travelers. To address this, we wanted to establish the usefulness of AR Glasses and provide an opportunity for more experiences to be built in this category. After debuting at I/O, we confirmed that language experiences on AR Glasses can transform the lives of many, and change how people interact with computing and access information. This led to launching public testing efforts and more investment in other AR Glasses experiences. This is relevant for the AR/VR industry, which has long been searching for scalable use. We hope to enable technology to fade into the background, so people can stay grounded in reality while still getting the information they need - in the context they need it.

Execution

Humans have an innate ability to understand where sound is coming from, who is currently speaking and what they are saying, and what to pay attention to and what to ignore. AR glasses allow us to use multiple microphones around the frames to perceive the environment from the user’s point-of-view and perspective. This ability is a key differentiator over other form factors. Earbuds cannot provide the required baseline between multiple microphones, and mobile phone microphones listen without relationship to the user’s perspective.

We pick up speech from the world using first-person microphones on the glasses and run it through our audio perception pipeline to isolate specific speech and identify what needs to be translated. Our platform leverages advancements in AI to translate the speech using state-of-the-art speech-to-text, automatic language detection, and neural machine translation models. We apply advancements in large language models to semantically enhance the presentation in the interface through speech understanding. Finally, we render the text on a new form factor of everyday glasses with a private display, allowing us to create AR language experiences that display live captions for the world so users can stay hands-free, eyes-up, and focused on connecting with the people around them.

Similar Campaigns

12 items

Picture what the cloud can do 2

ELEVEN, San francisco

Picture what the cloud can do 2

2019, GOOGLE

(opens in a new tab)