10 things every voice app should do

‍What you'll learn by reading this:

1. Do one thing really well
2. Make your name memorable
3. Focus on intents, not commands
4. Simplify choices
5. Use the one-breath test
6. Include a variety of responses
7. Handle the unexpected gracefully
8. Make enhancements based on data
9. Provide contextual help
10. Beta test with real users
‍

1. Focus on doing one thing really well

Jeff: An example of this that I like to use is a skill I built called Games Back. Obviously, we don't have a lot of sports right now, but with baseball, one of the things that I always care about each morning [...] is the standings and how my team is doing. My team happens to be the Cleveland Indians, and so I jump in and want to see how many games back they are. That's the column that I really care about more than anything else. If they won or lost last night - that's great and all - but what really matters is how many games back from first [they are].

That's the kind of thing that I'm looking for. So what I did is I built a skill that specifically gives me that one piece of information. So if you have an Alexa device at home, you could say, "hey Alexa, ask Games Back about the Cleveland Indians," and she'll tell you what the standings were at the end of last season [since] the data hasn't reset yet [and] we don't have a season this year.

The idea [here] is that I'm doing that one thing. I'm not telling you about wins and losses or when their next game is or how they did yesterday. It's just focused-in on that one simple task. You could continue to add things to it, but this is the core idea. We really want to do one thing really, really well.
‍

2. Make your name memorable

Jeff: This is kind of a fun story to tell.

There's a great skill in the Alexa skill store called The Magic Door. The way they advertise their skill [is they have] a door as their icon, and they tell people to say, "open the magic door." [When you do so], you hear this creeky old door open, where you then enter into their adventure.

The key to this is if you think about any app on your phone — there's an icon and a name. And as I'm flipping through my phone, I can easily find what I'm looking for because I have it in a list.

With voice, we don't have that luxury. So we need to be specific and focused on the question, "how do people remember to use our stuff?" These are some example names [which] I made up. You can see that there are lots of different ways to refer to a skill. So I might say, "run" or "resume" or "load" or "launch." All of these words mean the same thing; it's just a matter of how you tie them together with whatever your name is to allow the user to find your skill easily.
‍

So, "launch Rocket Ship Stories" might make a whole lot of sense or "open Box Maker," - if that's the name of your skill. In each of these cases, you can play with this to find the launch word that matches up with what you're trying to accomplish.

I [created] a skill that gives you three clues, all of which have something in common. Then you have to figure out what those things are. And so it's a trivia game [that] challenges your mind.

And so I thought — the word 'trivia' has the letters t-r-i in front. Maybe I should use that [in the name]. So I thought about flipping the words and using 'Viatri' which [happens to mean] "by way of three" — which I thought was a really clever name for it.

I realized very quickly that no one's going to remember this name. [It's] not memorable. It's not useful. It's not something that is going to stick. I need [users] to remember how to play my game, or they're not going to. And so ultimately, I came up with the idea to call itThree Clues — which made a whole lot more sense and makes it very, very easy for people to remember what the [app] is.

"It's the game that gives you the three clues!"

It's really simple. So think carefully about your name. It's very tempting — especially when you have a startup or a small business — to have some crazy name. Most of the time, starting from scratch, having a memorable name can be hard. So you want to think deeply about that.
‍

3. Focus on intents, not commands.

Jeff: Let me explain what I mean by that. I'm working on a skill right now called Dev Tips, [and] it's meant for Alexa developers to be able to get answers to all their questions about Alexa's skill-building but in a meta-way. [For example, you can] hear the sound effects available in the library or [find out] all the speechcons you can use. I want you to be able to experience [...] all of the different kinds of sound effects and audible things inside an Alexa skill [...] without having to build something.

And so I started with commands like, "ask Dev Tips about monetization," or "tell me about persistence." And what I realized quickly is that I have a database of all of these terms, but the users don't. They don't know that monetization is a thing they can ask about. They [also] don't know that persistence is in [that] list.

And so while these things do work, I also had to enable things like, "teach me something new," or "play a speechcon for me," [or] "what should I learn next?" By giving those kinds of capabilities inside my skill, it makes it easy for somebody to explore.

You can't assume that [the user] knows what to say or that they know what your content is. It's much easier to give commands like this, so that they can continue to move and grow inside your skill.
‍

4. Simplify choices

Jeff: The reason for this is because we don't often think about a conversation in a deep, meaningful way. And if you think about it, most conversations are a back and forth, bouncing between each other. Maybe [there are] interruptions. Sometimes there are questions. But when you [do] ask that question, there's a variety of answers you could give, right?

So what are all the questions we can ask, and how should we think about asking those questions?

The big one for me is we often tend to lead with open-ended questions. These [examples] are NOT those:

"Is there something else I can help you with?"
"Do you have another questions?"
"Would you like to know something else?"

The answers to all of those questions are yes or no — those are the two cases. If they say yes, you're going to end up saying one of these:

"What else would you like to know about?"
"What else can I help you with?"
"What topic can I assist you with?"

You don't want to] ask yes or no questions unless it absolutely makes sense. Instead, ask [open-ended] questions — like in the examples above — that get [the user] to the point where they know exactly what they want to do.

Imagine I have a fruit stand, and I've built a skill for it where I say things like, "we have apples, bananas, oranges, lemons, grapes, kiwis, blackberries, strawberries, and mangoes." And then [after] I say, "which fruit do you want?"

The challenge with asking a question this way is that we've given [the user] a very long list, and it's hard to figure out exactly which [fruit] I might want. By the time you say mangoes, I've already forgotten that bananas are in this list.

In addition to that, I asked the question afterward, and because of this, [the user] doesn't know that you were asking [them] to pick one. [Initially], you were just telling [them] stuff and [so] they weren't paying as much attention.

Let's limit our choices and ask the question first.

So [instead] say, "what fruit do you want — apples, bananas, or oranges?"

You could probably also tweak this to say apples, bananas, oranges, or something else. And if they say something else, or if they say 'grapes' — then great! [You can then] go execute what they want to do with [there choice being] grapes.
‍

5. Use the One-Breath Test

Jeff: Here's a really good example of this:

I built a Star Wars skill that has lots and lots of content about everything you could imagine in the Star Wars universe. [When I added] Luke Skywalker to my content, I found a really good description of [him]:

Luke Skywalker was a Tatoonie farmboy who rose from humble beginnings to become one of the great Jedi the galaxy has ever known. Along with his friends Leia and Han Solo, Luke battled the evil empire, discovered the truth of his parentage, and ended the tyranny of the Sith. A generation later, the location of the famed Jedi master was one of the galaxy's greatest mysteries.

[This description] is really long, and so a good rule of thumb is if you can't say it in one breath — it's too long. This description is probably two or three breaths.

What I did instead as an example, is that if you ask the skill about Darth Vader, [Alexa] says, "Darth Vader is a bad, bad, bad, man." Very simple. Very to-the-point. A little humor in there. And so, that's the thing that we're really trying to accomplish.

If people want more, you could offer that, but I wouldn't lead with tons of content. You often lose people's attention.
‍

6. Include a variety of responses

Jeff: This is something that is really important to me, and something I do all over the place.

When we build applications for mobile or the web, you often find specific rules that they want you to follow. [For example, you have] the iOS style guide [and] the Android style guide. They [use concepts] like predictable and consistent, and those things are 100 percent the opposite of what we want voice to be. ‍

With voice, it needs to be unpredictable. It needs to be something that you are engaged with and that you are actually listening to or choosing to play along with. And so what I like to do, is keep everything fresh and random. For every single thing that my skill can say, I like to have 5 to 7 versions of that thing.

You can do this with everything. Vary the order [or] the content you're using by keeping people on their toes and not letting it be predictable. You'll find that they stay engaged and that they're participating more in conversation with their voice system.

The other thing I like to do is randomize the order that you present things. For example, if you're building something for a travel agency, there are five things you need to allow the user to make a flight search: origin, destination, airline, departure date, and return date.

Where are you flying from?

Where are you flying to?

What airline do you want to use?

When are you flying?

When do you want to return?

Now you may always ask the questions in this order, but what's to say that the second time they come back [and complete this], we don't [use the same order].

It doesn't matter what the order of these things is. We know that we want all five of those values. So by mixing it up, it doesn't feel automated. It doesn't feel like I'm filling out a form with my voice. It feels like I'm having a conversation with someone, and I'm just answering their questions.

So this is another really good example of how we want to think about randomizing the order that we do things, and not only the order that we say things.
‍

7. Handle the unexpected gracefully

Jeff: One example of these is errors. Errors are going to happen from time to time. Maybe an API is down. Maybe you built something that didn't work properly. [Although] Alexa has default error messages [such as], "there was a problem with the requested skill's response," you should control and manage these to make it seem like something accidentally happened [and that] you're handling it.

Here's an example of that:

"It seems our trivia questions are better than our software developers. Something is broken, but we've alerted our team to the problem. Can I offer you a random trivia question instead?"

So, the trivia questions stuff is working, but whatever they were trying to do is broken, so let's instead think about how we redirect them - maybe adding a little humour to it [as well]. And then just carry [the conversation] back to where they were going. [This] is something really important to think about, but not something people pay a lot of attention to. There's always opportunities for things to go wrong, and having some fun with it makes a big difference.

[Another] example of this is my baseball skill. If I came in and asked about the Cleveland Indians, the New York Yankees, or the San Francisco Giants — my skills were ready for all of that.

But what if they say the New York Jets. What do I do then? I don't have the New York Jets in my database [because it is a football team]. I don't want to throw an error. I need to do something with this information. So what I do in my case is I say, "I'm sorry, the New York Jets aren't a baseball team. Is there a different team I can tell you about?"

I'm redirecting them [instead] of saying, "I don't know what to do." So this is a really good example of how you can redirect people to the thing they should be doing.

8. Make enhancements based on data

Jeff: Publishing a skill is not the end. It's more or less the beginning. As you're building, you'll see that you can collect a whole bunch of data about what's going on inside your skill. What are people saying? What are people asking for?

Anytime that I'm collecting data from a user, I always persist that to a database so that I can compare later and say, "what are people asking my skill for that I'm not accommodating." By parsing through that data and looking at it in a deep way, we can say, "oh, we should be fixing this!"

Another good example is intents. If we think about my Dev Tip skill, we have a get news intent, we have an answer intent, we have a launch request [which is where] they start. And [I] can see that the get news intent seems to be the most popular [as appose] to the display template intent [which is] very infrequently used.

So when I ask myself, what am I going to build next? It should be focused on what people use the most and how I can continue to provide value to my users.

‍

9. Provide contextual help

Jeff: This is something that we all skip over, but it's always required. We have to provide help inside of our skill.

We often do things like this:

"This skill lets you order from our big menu of pizzas, breadsticks, pasta, sandwiches, wings, and desserts. What would you like to do?"

All I've done here is given [the user] another opportunity to ask for something. I haven't actually helped them. If they were in the middle of ordering a pizza and weren't sure how to get double sauce, the above example doesn't give them any of that assistance [if they ask for help].

What help should look like is this:

"It looks like you're trying to order a pizza. You can add or remove toppings by saying things like add pepperoni or remove anchovies. You can also add extras like sauces, extra cheese, or our different crusts. Just ask for them! Do you want to get back to your pizza, or do you need help with something else?"

It's possible that they didn't need help with the thing they were doing - and in that case - you could try to redirect them to the appropriate kind of help that they're looking for.

Help is super important because people get lost. There's no UI. There are no visual clues to let them know that you're supposed to click this button or go over here. They're only using their voices and sometimes they just can't figure out exactly what that thing should be.

10. Beta test with REAL users

Jeff: I can't emphasize this enough. If you're not getting your skill out in front of real people, then you're never going to feel like it's as successful as it could have been. This is because each of us has a [different] idea about how people speak.

We need to get other people in front of us and talking to our skill. We need to help us understand what's going on. And what you really want to do is get people you know to use this stuff. Invite your friends and your family — people that will give you open and honest feedback.

A lot of times, if you open it up to people you met online or a large mailing list, very few people will tell you anything. They may use your skill a couple of times, but they're probably not going to give you the kind of meaningful feedback you're looking for.

If you're working in your offices on voice stuff, the thing I recommend is taking theWizard of Oz approach. There's two lessons to be learned here.

The first is that if you're trying to build something that replicates a mobile app or a website, a good rule of thumb is that if it takes more than three clicks to get to, and is a place the user wants to go regularly, should think about building it for voice.

So I always think about the three clicks on the Ruby Slippers (Wizard of Oz) as a way to remember that those are the things we should really be focusing on. Conversely, if they can get to it [quickly, or in less than 3 clicks] by opening your app, that's probably going to be faster. So it's something to keep in mind.

As far as testing goes, you should have somebody behind the curtain — like the wizard [in The Wizard of Oz], who knows what the skill should do. Then, you have people come up and talk to the curtain so they can't see any facial expressions or body language. They ask the skill to do things, and the skill — or the person behind the curtain in this case — responds in the way that the skill is programmed to.

You'll learn very quickly how people want to interact with this content and how they want to interact with this skill.

10 things every voice app should do

More on Jeff Blankenburg:

Full recording below: