The secrets to building a world-class IVR system

Stuart Silverstein helps companies up their efficiency and work smarter by designing the right systems and processes. Working as the Director of Service Design at AIG Insurance, Silverstein has a wealth of knowledge and experience consulting for teams on platform and innovation design, experience ecosystems, product strategy, and platform/UX design.

We sat down with Stuart to talk IVRs and how his extensive background in UX design has given him an edge when it comes to building telephone systems. Watch the recording below, or keep scrolling to read the highlights.

Q1: What are some of the things you've worked on at AIG?

Stuart: I'm the Director of Service Design at AIG. I work in the annuities department, [and] so I work on financial products. I'm an enterprise platform UX designer, and so even though [service design] is in my title [...] my background is in hardcore UX with a specialty around platforms and enterprise.

I mainly work on complex systems. In my current role, I'm working on both the consumer side and on the internal side as well.

So, when was the last time you actually called a company and had a really great calling experience? It's something that you dread. It's something you really don't want to do. It's kind of that option of last resort.

So with that...what I was tasked with was designing a best-in-class experience for our customers. And the thing about this was they wanted a UX look at it. Typically what happens with these things is that it's done by a tech team - where the tech team basically takes orders and says, 'hey, what do you want?' And then comes up with something that may be very functional, but not based on best practices. It's not based on what we would consider a great user experience.

So with that, our challenge was basically to create a best-in-class modern experience. A couple of things we had to include as part of this work was natural language processing. We had to do intelligent routing - which means that a lot of times when you call these systems, the user is asked to press one, or press two, or press three, or do certain things. And they don't always figure out [how] to get to the right place. So we had to do that for them to get them to the right person, and quicky. And then also, we wanted this to be like a conversation. The best experience is when you start talking about an IVR or phone experience, and...the machine is responding to you, and it feels like you're talking to a real person.

Q2: Explain the process behind planning an IVR experience from a UX perspective?

Stuart: When we started talking about the process, the first thing we had to do was there was no precedent for this internally. Personally, I've worked in UX for many years [and] this was kind of a new process for myself as well. So what we had to do was kind of use the best practices that we use from all our backgrounds as UX'ers and then try and apply this to the voice design process.

So this started out with mapping out the process, where we figured out a few things around authentication, and what we wanted to achieve. What success would look like. We created a prototype, and then, tested this with users creating that same UX design, human-centered feedback loop that we would do with any product.

This was not my first project working in voice. I've actually built other things in voice but they were a combination of voice experiences with an actual GUI. This was the first time I've worked in a situation where it was pure VUI. And when I started working out these challenges, there are certain things that you have when you're working with a pure voice experience that you don't have when you have the benefit of either a GUI interface or a combination of using voice with a mobile app or a web experience. You don't have any visual cues. [You] don't have icons. [You] don't have color. We don't have a lot of those things at our disposal to help the user understand what's going on.

We also don't have any visual representation of location. Meaning that where you are in the app...there's no breadcrumbs. There's no global navigation. It all has to be visually done without seeing these things.

Confirmations are also another thing that are a bit of a painful, necessary evil that we have to deal with. If somebody messes up, the ramifications of someone making an error is a lot more.  It's harder to navigate back and forth. And this is also a real-time experience, right? In web, things can jump around. It's nonlinear. I can go wherever I want. When you start talking about a linear experience in have to very much understand that impression.

The other thing is that while voice has [come] a long way - it still fails, right? Garbled voices, different words that they're using, or ways that they're phrasing. We want to make sure that we're understanding and anticipating those needs. Then, as soon as we detect some sort of frustration, we address it in some way. This [brings us] to utterances and intents. We need to program those and make sure we've got them right.

We're also going to have to deal with choosing voices and recording — something very specific to IVR. If you don't come from a creative background where you've had to do radio ads or something like this, this could be totally new where you have to pick talent, you have to choose the right person. You've got to pick the gender of [that] person. And then also record it. And then this also comes down [to] performance and intonation. How are you going to read these things off?

The other thing that's very unique to IVR is at what point do you get this to a real person? Because ultimately what we want to do is [...] create some self-contained experience where people can get most of their questions answered on their own and then transfer them to an agent, in which case they have as much information as possible.

Part of the business challenge is we want to obtain those calls and keep those calls as short as possible. [Where we have] only the calls that need to go to an agent, go to an agent.

Q3. Walk us through how you would prototype an IVR

Stuart:  So the first thing we did when we started out with this [project] was trying to figure out, 'how are we going to prototype this thing?' Ultimately, writing scripts on a piece of paper was not going to achieve what we wanted to achieve. We weren't going to be able to hear it, feel it, try a bunch of things [and] test the logic of it.

It was going to be very flat on paper, so we considered building it within the telephony system. [However,] that was going to be really cumbersome. So we landed on Voiceflow after doing some research, and we figured out that we could very well emulate a voice experience, and that the Alexa skill was going to be very similar to any voice experience minus the touchtones. And that's what we wound up choosing.

We also used Sketch and Invision for collaboration and mapping out prompts, data flows, and all that stuff. Although I didn't realize Voiceflow now offers the ability to add a lot of those things, so it would be interesting to try that at a later date. But [for this project] we kept those things separate.

So the first thing we did was start with [the] typical things we would do with any UX experience. We started out determining what flows we were going to need to prototype. What were the tasks and the use-cases. We also needed to find out authentication criteria, which I'll get to a little later on. We also had to find out specific conditions that were going to trigger personalized messages.

Because we want these things to be very efficient, we don't want the user to pick a lot of items. We want an intelligent enough system where we know what it is that they're trying to accomplish.

So, as a result, we picked specific things that we knew people would be calling about, and then we brought them all the way up to the top [where we would say], 'hey, are you calling about this piece?' Let's get you to the right person or into an automated flow or the information that you need straight away from the top.

We also had to do personas and segmentation. The same stuff you do with any kind of UX/UI experience. And then we mapped out the script and the flows.

Q4: What does the structure of an IVR experience look like?

Stuart: When you start talking about an IVR experience, we talk about this idea of a beginning, a middle, and an end. [At] the beginning of it, [you can think of this as] your intro [along with] any conditional messages you might have. For instance, if there is an outage, if you're experiencing a high call volume, etc.

And then the other part of that is we're going to want to authenticate and identify what the course of this conversation is about. It could be something like a social security number or it could be something peronsally identifiable. It could also be a tracking number. Any of those kinds of ideas [helps us] first identify the context of this conversation.

Once we've identified you, we're going to give you a personalized experience. Something that's going to be unique to you. That could be menu items or some way that I identify who you are. We're going to want to have some main or global menus. There's going to be these automated service microflows [that will] branch off from our main hub. These little microservices or microflows will be [used] for specific tasks. After that you have relevant messages that direct you to the right spot.

Once we get to the end portion of this, does the user just hang up and they're done? Or do they need to get additional help? If they need to get additional help, how is that transfer going to work? It could be a callback, where you say, '[do] you want us to call you back at that number?' If someone does opt to be on hold, what is that whole experience? Is it something that gives you a sense of movement? Is it something that gives you a sense of brand? And then also, what information is then passed to that CSR (customer service representative), because there's nothing more annoying than validating and spending your time authenticating only for a CSR to ask you the same question [again]. And [finally] routing, because ideally, when I talked about that intelligent routing that happens upfront — we want to make sure we get the right person to the right place.

Q5: What are some of the ways you carry out authentication in IVR systems?

Stuart: Almost all phone systems require some sort of authentication [that] you're going to pass on to an agent [...] or to personalize the experience. There are a few ways you can do this:

  • ANI (automatic number identification): You can use their caller ID as one of your points of authentication to personalize the experience.
  • Web transfer
  • Last 4 digits of social security number
  • Order number
  • Account number
  • Card Number

Any of those ways are things we can use to identify people. When we're talking about authentication, we want to make sure that most of these authentication pieces are not things that people have to go look up, but [rather] stuff they already know off the top of their head.

Q6: Explain the importance of deciphering caller intent

Stuart: After authentication, the intent is the next most important piece to get. It can be used to determine the need for certain CSR skills and intelligent routing.

As [a caller] goes through your journey, you should be able to map where you're going to capture those caller intents. What actions is somebody going to take that will tell you this is what they're trying to achieve and where they need to go? 

This may have something to do with skills routing; and in most organizations it does. This is because not all CSRs have the same skills, and they can't do the same things. So within your organization, you want to find out what [they] need in order to be able to route [the caller] to the right place.

The other thing is that if something is taking place, we want to [confirm] that we know where they're going.

Q7: Explain the use of intents, utterances, and slots in an IVR system

Stuart: So as I mentioned earlier, the intent is what the user is trying to accomplish. So when we're doing our mapping, we want to make sure we map out each of our intents to a flow. Meaning once somebody says, 'I would like to select the get a form flow,' I'm going to need to map [it] and all the microflows that go along with it.

[An utterance] indicates that somebody is trying to do something. We wound up mapping all of our utterances within Voiceflow, [which] is what we use to check and see what kinds of things we might want to say within our prototype.

Then there are slots and variables which can be used for verification or within the system [itself]. There's a bit of a grey area between slots and variables. Slots are typically used within the context of the conversation and variables are things that we keep track of within the prototype that are not really used within the context of the conversation.

Q8: How do IVRs handle error-correction?

Stuart: So we're talking about mistakes in navigation. We want to offer the option to correct as people make mistakes. So repeating automated prompts is a good one [along with] universal access to a main menu or representatives. This means if somebody gets frustrated, the first thing they'll say is, 'operator' or 'representative.' We want to make sure that those are global prompts or global intents — [which means] if somebody says, 'I want to get a representative [or] I'm done with the automated service,' we can get them out of there.

The other thing is the plan for system fails, repeated failures with progressive information, and fallbacks. That means that if I make an error the first time, I may be mildly frustrated. By the second time, I'm a little more frustrated. And by the third time, I'm pretty darn frustrated and I'm fighting over it.

Voice systems fail for all kinds of reasons. Anyone who has an Alexa has experienced [this]. So what we need to do is plan for these failures elegantly. And so the first prompt maybe something short and sweet like, 'how can I help you today.' The second prompt might say, 'I didn't quite get that, can you try saying withdrawals or account values?' And then by the third or fourth time, we might [push you to an operator] and not let you sit through this IVR anymore.

By the time you get to that third prompt - that's my rule of thumb - don't put them through anymore!

Q9: After you've created an IVR prototype, how do you conduct user-testing?

Stuart: Once we get into this idea of testing with users on the Voiceflow application, we found a couple of ways to do it.

One of them is to upload it to an Alexa app and [then] get somebody's email and have them [test it] on their own Alexa app, which is a good way to do it. You can get them to a pretty real prototype where they can actually feel it. [This] works more like the real thing.

The other [options are] the web version of the interface or doing it through the phone. This is where you have a phone and you hold it up to your mic [with the Alexa app open] and [have the caller] speak [to it]. Both of those seem to work pretty well overall.

If you get to the point where you're [working with] a developer, you might want to take all your intents and everything else and create an Amazon Connect prototype. Amazon Connect is the telephone system that Amazon uses, and a lot of stuff can be reprogrammed. There's still a level of effort there if you've got a really good developer, [and so] you might want to pair with a developer on that.

Q12: How do you normally generate your list of user-testers?

Stuart: I use the same sources that you would use for any sort of UI type testing. There's, there's There are all these different ways that you can recruit users depending upon your organization.

And then you can [conduct] these through a couple of different ways. You can do it through the same platforms. Let's say you're using lookback or validately or something like that, you can just record [the call] just like any other kind of interview. You can [also] use Zoom or WebEx or whatever video sharing platform you choose. You record the interview and take it back with you to analyze.

Q10: How do you offer scalability (thousands of intents) to organizations when recording voice vs. using a text-to-speech provider?

Stuart: That is a challenge. Text-to-speech can be recorded on a regular [basis], and you can change things on the fly. You can change the type-up, and then it reads right back to you.

[On the other hand], if you have to get things recorded, you have to send it out to a voiceover artist, have them record it, and send it back to you. It's really slow and it can take at least a day or two.

I think that's where you have to figure out in your organization how you can seamlessly blend those ideas together to be able to provide an experience. [For example,] if your requirements are that you need to provide a lot of data that [calls for] text-to-speech or quick delivery of text-to-speech type options, you're going to need to build that into your design.

And that could be [done] by [deciding] what kind of voice you choose [when picking] your voiceover artist. [Then] do you pick an Alexa voice or a voice that will blend in with [it]? [You may want to] choose some sort of text-to-speech that will blend with [your voiceovers] seamlessly, so it's not noticeable.

And so I think the better way to figure out how you're going to blend it is [finding out] where you can use recorded voice, and where you need to use text-to-speech, and making that [transition] seamless in some fashion.

Q11: What IVR pain points can be addressed by multimodal voice + touch? Or the reverse?

Stuart: When you have a multimodal situation there's a couple of things that happen. In my experience, you've got a visual representation of where you are [and] you can [also] give more detailed information.

For example, if I've asked for something, and then I get a screen that tells me where I'm at, but also gives me more detailed information — those two items working together lowers the cognitive load. [But] when we're talking about pure IVR, you don't have any of those pieces.

Things that can help are getting them the right information that visually reinforces what they've asked for [which] may give them a better experience for them to scroll through. And then also visual cues — meaning possibly another option [where they can] jump in or initiate [a] voice [response] again. Those are a few things that can help with a multimodal experience.