Designing for Voice — What I Learned While Making Train Track

Eli Horne
5 min readJan 16, 2018

Every morning, my wife Liz and I interact with a disembodied voice that lives with us in our small Brooklyn apartment. Without skipping a beat, the voice chirps back and provides useful answers to most questions.

Nerd joke.

If you’ve read any of the recent New York Times articles on the state of the MTA, our public transportation system, you know that the trains are often delayed.

While getting ready to leave for work, Liz asked a new question: “Is the L train currently delayed?” and to our surprise, the voice didn’t know how to answer.

Now, I’m a self-taught code bumbler who can search Stackoverflow as well as the next person, so I said to myself: “I can make this!”

Half-way through the project.

It took about 48 hours of cursing, and 2 weeks of thumb twiddling while I waited for the app to get approved, but Train Track is live and available on a device near you.

You can now ask your Google Home, Android phone or Google Assistant app on iOS for the real time status of your favorite MTA line:

Ok Google, ask Train Track if the L train is delayed

I kid. It actually tells you the real time status. Usually.

Designing for voice

By day I’m a product designer, pushing pixels around services you’ve probably used on your phone or laptop. I obsess over what you should and shouldn’t see. It’s not a hard science, but there are some general guidelines to designing for visual mediums — provide scannable visual hierarchy, make what’s clickable/tappable obvious, and provide instructional copy.

None of these concepts apply to voice design.

Many ways to do the same thing

Generally, visual applications strive for a canonical way to do something. One button to send the email you are composing. One button to place that order for midnight burritos on Seamless.

With voice actions, you need to anticipate all the ways a person might think to ask for those same features, because getting it wrong often means starting the task from the beginning again. It’s like predicting all the questions someone will ask, and all the ways they might ask it, before a conversation even begins.

Case in point, Train Track’s primary feature is checking a specific train’s real time status and reporting back. In visual design world, you could imagine a button saying “Check status” next to the selected train line.

But in voice land, someone might say:

Is the L train on time?

or

If the L train is delayed.

or

L train running?

These are all valid questions that should have the same answer.

Deliver the right information at the right time

Apps and web sites have the added benefit of contextual surroundings to help teach people what to do. Thanks to eye tracking, we can tell that people scan the screen to understand what options are available to them and how to achieve their goals. Voice interactions do not have that benefit.

Eye tracking testing done for HealthCare.gov

Your verbal exchanges need to be concise and to the point. Receiving a specific request is not the time to educate people about other related features or services. Instead, designing for voice means supporting other dedicated avenues like “what can you do?” and “I need help.”

It’s also important to maintain trust by not listening to the microphone longer than necessary. After each question, you have the option to end the conversation, or wait for a follow-up. While it may be tempting to leave the microphone active (track all the things!), this is a bold new frontier where people are getting used to trusting these devices in their lives. Be a good citizen, and know when the conversation is over.

How can I make a voice app?

It has never been easier to get involved with assistant technology. Google and Amazon do the heaviest lifting — speech recognition and verbalizing responses. A simple game with preset responses can be put together in minutes. Doing something more advanced (like fetching real time transportation data and interpreting it) can be done with a little bit of JavaScript knowledge and the patience to sift through Stackoverflow.

Accenture’s Digital Consumer Survey 2018 on Digital Voice Assistants (DVA)

At the end of 2017, there were roughly 450 million voice enabled devices in the US. It is projected that by 2022 there will be 870 million — that’s a 95% increase.

This is a new world full of opportunity — one in which the big brands and services haven’t gotten involved. With the launch of every new platform, creative people willing to learn the guidelines get to define its future.

--

--

Eli Horne

Product design @ kickstarter, GIF aficionado. Previously pushed pixels at @google, @foodnetwork and @hgtv