If there is one person who understands the ins and outs of great conversation design - it's Cathy Pearl. Pearl, who is the Head of Conversation Design Outreach at Google, joined our live AMA and answered a host of questions submitted by the Voiceflow Community. From the emergence and growth of conversation design to what you should consider when building a VUI designer portfolio, this recap will dive into a wide range of topics covered during our one-hour discussion.
Cathy Pearl has been a driving force in voice tech and VUI design for over two decades, creating conversational interfaces for voice, text, and multi-modal since 1999. A world-renown leader in conversation design, Cathy has helped educate thousands in the voice space by spreading her knowledge through a variety of different mediums. She has gained notoriety as an international speaker, having presented at a variety of events, including Cannes Lions International Festival of Creativity, SXSW, O'Reilly Design, Voice Summit, and Project Voice. She has also penned numerous articles, blogs, and her very own book titled: Designing Voice User Interfaces - a great go-to resource for those interested in exploring VUI design.
What you'll learn by reading this:
Cathy: You've got to go back to the mid-90s when speech recognition first became mainstream on automated phone systems. I used to work for a company called Nuance that builds IVRs and these kinds of things. And it was kind of the first chance for regular people to experience speech recognition. And I I'll tell one story about that time which some of you may have heard me talk about before.
Nuance was one of the first companies to make speech IVRs and the first client they launched was the financial company, Charles Schwab. And you could call up the system, but the only thing you could do was get stock quotes. That's it. And what I think is interesting is they knew this would obviously save money, because before you had to call a human to get a stock quote because people weren't really on the internet yet.
At one point, they found that the heaviest traders ended - who usually call their broker once or twice a day - were calling it dozens of times a day. Technically, they could have called their broker as many times as they wanted, but they didn't because it was like, I'm bothering somebody. It's a human. But when they can make a call to a computer, they didn't feel that anymore. It was kind of an unexpected side effect to the technology that they hadn't really realized.
Fast forward to the present and obviously things have changed in a lot of ways. Some things haven't changed, but speech recognition has had massive improvements. Of course, we no longer need to write every single grammar. That being, natural language understanding (NLU) still requires a lot of manual labor. It's not all automated by any means.
And of course, the other great innovation are the far-field microphones. When people talk on the phone, you've got a uni-directional mic. You know who is speaking [and] it's a lot easier to do that speech processing. When there's a bunch of people in the room, it's like, is that someone talking? Are they talking to me? It adds a whole layer of complexity, and I'm still astonished, honestly, when I can talk and have my smart speaker actually hear me.
Cathy: Now when I started, there was no such thing as conversation design. That's a pretty new thing. We used to call ourselves VUI [or] voice user interface designers because it was all voice back then.
Now, over time it's evolved into the term conversation designer because now you might be designing for voice and text. All of these different things are part of conversation design.
In terms of responsibilities, when I worked a Nuance, we wore a lot of hats as VUI designers. I would meet with clients and get requirements. I would write prompts, flows, 300-page dialogue design specifications. I would do usability testing [and] traversal testing. I would handwrite grammars and even do some tuning where we got data back and would tweak all the parameters and the recognition engine.
Now, that's changed a lot for most folks. If you're at a small startup, sometimes you are wearing a lot of those hats still, but at a bigger company, like Google, it's much more specialized. People are often either just doing design or they're just working on natural language, or they're just working on TTS.
But a good conversation designer should really still try and understand the technology. How does the actual speech recognition work? It's still really important to have that knowledge in the back of your mind when you're designing. [Today], a conversation designer doesn't just do voice - they've also got to be able to do multimodal, voice only, voice forward, intermodal - like the mobile phone or the car or the watch. So there's a sort of broader skill set that is required.
Cathy: If anyone has seen me give a talk before, they probably know the words that are going to come out of my mouth next, which are sample dialogues. They are really one of the key components in a conversation designers toolkit.
And for anyone out there who might not be familiar with sample dialogues, you can think of it like a movie script [or] potential pathways through your conversational system. You write down what the user might say whether you're [designing for] voice or text. And each one of these individual sample dialogues is just one possible pathway.
The great thing about sample dialogues is that it's really low fidelity. It's very easy for people to understand - like getting stakeholders involved or if you're just throwing around use cases. And what we recommend is once you've written sample dialogues - even if it's for a text bot - read it out loud, especially where somebody is playing the user who is not familiar with it. And you'll be shocked at how many things right out of the gate you find problems with.
You might find that you can't handle certain responses. Maybe you find something interesting that you can add to. So absolutely I would recommend you write sample dialogues. Anyone can do it. You don't have to be a great writer. It's just a great way for you to get your ideas out there, so you can start finding the good parts and the bad parts.
Cathy: If you're lucky enough to have multiple conversation designers on your team - which I know a lot of companies are lucky to even get one - it's great to have someone who has a technical bent where they really get the speech recognition portion of it. They understand the basics of ASR and NLU and can really utilize that to remind the team of like, oh, this won't work because of this. So it's great to have somebody who's an expert there.
It's great to just have somebody who's just a really good writer. I can write decent prompts — but sometimes, you know, when I want them to be better, I'll go to a colleague. Some of you may know my colleague James Giangola, who co-wrote one of the first voice user interface design books. And i'll go to James and ask him to make it better. And he'll just turn out these beautiful prompts.
Some people just really have a skill for that, and it's great to have somebody on your team like that. And then somebody who is skilled in multimodal design is great. Somebody maybe who has a little bit of a visual design background, who can really help with those conversational elements.
But I think the thing i'd say is most important is for a team to have [is] the ability to communicate well even when they disagree and the ability to compromise. I think a lot of times as designers, we are artists and we say, this is my beautiful design and it must be so, or it will be terrible. Rarely does your design get built just the way you wanted it. That's very rare. So it's really nice to be able to say, okay, I can compromise sometimes what I want. Sometimes I have to do it another way. I think you'll be happier that way and more successful.
Cathy: I think one, again, goes back to understanding speech recognition.
I find sometimes I'll talk to people who come out of a text-chatbot world, and they're very used to designing for text only. I think some people have this idea that speech recognition is a solved problem like, "oh, it's 95% accurate we're good to go." [So] they don't realize that we still actually have a lot of issues. People may not know that short utterances, like yes or no are actually still pretty hard to handle sometimes. Or if you were building something where you need to collect an email address, it's pretty easy for most people to type out an email address, but boy is that hard for a speech recognition system.
Also, I think people are used to looking at transcriptions to see, "well, how's my system doing?" The transcriptions really just aren't always accurate, and sometimes you miss issues because you can't hear what was actually said.
So again, having that knowledge that speech recognition doesn't always work. Things like a voice assistant [for example]. You may say something, but it may not always hear you speak. So if you write a prompt that says, "you should say something now," it is infuriating to the user. [Maybe] they did say something, and the assistant didn't hear [it]. And so just having that knowledge can really benefit your design.
Cathy: That is a really good question.
First, of course, I'll say I can't speak for all companies or recruiters. But for myself...a portfolio should represent some of the work you've done. And as far as how you represent that - especially if it's a voice-only experience - there are some different ways, but I like to see process. I like to see, [for example, that] you had this idea, and here are [your] first thoughts about it, like a sample dialog or a flow or something like that. Maybe you realize that's not going to work. So [you show] a revised flow.
I love to see when people have insights or an "aha" moment during [the] design process. Because nobody designs things perfectly from scratch. So when I see a perfectly polished, finished mock or experience in a portfolio i'm like "but how'd you get here?"
In terms of how you represent this kind of stuff, there are some different ways.
One thing you can do - like if you're demonstrating something like an assistant action - is you could just record a video of yourself talking to it or [even] just the audio. So you could have a recording of an interaction [along with] showing things like sample dialogues or flows or other artifacts of your project that are also really useful.
Another thing I advice people often is...I like people to highlight fewer things than more things and go deeper into [those] few things. Go deep into one thing you've designed, and again, any "aha" moments you had or pivots you took. I find that really helpful because it feels like I get to know your problem solving a little bit more.
if I have to go to a portfolio and look at 15 different things to get a feel for what you do, it's going to be too challenging. Just remember, people who look at your portfolio probably don't have a ton of time. You might want to pick three projects, if you can, and expose different areas of your expertise.
I also get asked a lot, "well, I can't get a job because I don't have any experience, and I can't get any experience because I can't get a job." So in those cases, again, you can still build a portfolio with your sample projects. If you use Voiceflow or something and you make a prototype, you can use that in your portfolio. You don't have to use just stuff that was launched.
So that's a great way to get started, even if you don't have the job as a conversation designer [yet], but you want to show off some of your skills.
Cathy: I think one of the top ones is you assume anyone using your system knows what it's going to do because you know. People love the idea of having super conversational, open-ended stuff, but most companies can't actually build that.
So it's ok to have a fairly directed experience. For example, "welcome to Cathy's Trivia Game - do you want to play easy or hard." [You want to] get the user to have a successful interaction right off the bat with a simpler question before you throw them into your experience. So I think being opened-ended and not explanatory enough is one thing.
Another huge thing I still see in 2020 is just really poor error handling. Because again, you got to know that people are going to say or type things that you didn't plan for. It's going to happen. And if you go with your default, "I didn't understand, please say your command again" — you're doomed. So really investing in [error handing] is something I still people not doing.
And then finally, I think one thing that I also see is people write a prompt and they think, "okay, i've asked the person this question, and they're obviously going to answer with something related." [However], they often don't. And [so] designers and builders think, "what's wrong with these users?" Or, "I told them what to say, and they didn't say it."
[You need] to come to terms and accept that humans don't reply exactly the way you want. They are not going to say something bizarre or out of the blue, [however]. For example, if you say, "how many people in your reservation?" they're not going to reply, "how tall is Barack Obama." Although they might say, "do you have outdoor seating." A perfectly legitimate question. And you should be able to handle those domain specific things people say.
Cathy: There's a few places - like on Twitter I recently posted a couple of links to articles. One was a summary of conversation design courses that are available right now. [Previously, there were] no formal course[s] you could take, but now there's multiple ones where you can sign up and learn online. Even sometimes get a certification.
There's tons of great podcasts out there about voice [that] you can learn from. [There are] lots of great videos on my website cathypearl.com. I have an FAQ section where I also link to some other books that are interesting.
Some [helpful] books that aren't just about, say voice design, but are more about human conversation are:
1. Talk: The Inner Working of Conversation by N.J. Enfield
2. Talk - The Science of Conversation by Elizabeth Stokoe
3. Wired for Speech - How Voice Activates and Advances the Human-Computer Relationship by Clifford Nass
4. Voice User Interface Design by Michael H. Cohen, James P. Giangola, Jennifer Balogh
5. Ubiquitous Voice: Essays from the Field by Lisa Falkson
These are great books to help you understand how humans talk, which I really think is beneficial when we design for computers. I also have a link to an article I wrote called, "How to become a conversation designer," which again has some links and ideas for people looking to get started or [those] looking for more resources.
But the main thing I say to people, or the best way to learn is to try - to use a tool like Voiceflow that lets you experiment. You don't have to be able to code to get something up and running. You can put it out there, you can get feedback. You will learn so much, so fast. And I think that's the absolute best way to get started.
Interested in joining our next live webinar? Sign up for upcoming events here.
He previously founded and ran three Internet companies (I/PRO, Topica & TextMarks) and now is set on better understanding voice.
In this edition of the VF newsletter, we take a look at our latest feature release, Project Voice, Creator of the Month and more!V