Save more information with the improved Capture Step

Creating a conversational experience that mimics human conversation is the goal for all conversation designers. Part of what makes a conversation so natural between humans is the ability to ask a question, or multiple questions, capture and comprehend information, and have some sort of reaction to that information.

Outside of promising to deliver a resolution, the success of any conversational experience can be gauged by the ease with which consumers can interact with the experience and the ability for the Bot to recognize and process context.

Traditionally, conversational experiences needed to be strict with how they capture information. You couldn’t add any auxiliary statements if you wanted the Bot to fully recognize the right user input. You can ask for someone’s name, but you can’t have them input “My name is Sara”, otherwise the entire phrase will be captured as the entity. Obviously, that sort of customer behaviour isn’t preferred from a data capture perspective and it drives a lack of trust in the consumer for the rest of their interaction with your Bot.

But in order to create a seamless, user-friendly experience, Chatbots and Voice assistants need to be able to capture information in the ways that humans talk.

That’s why we made a massive, and much-needed, improvement to the capture step on Voiceflow! It will enable you to build vastly smarter and more dynamic conversational experiences.

Re-introducing the Capture Step

Where To Find The Capture Step

The first thing you’ll notice is that the Capture Step is moving from the Logic section to the User Input Section, to make it easier to find and more suitably placed with other blocks that perform similar actions.

This is intuitive because the capture is ultimately waiting on user input. You’re also no longer able to add steps behind capture within a block - it must be the last step like a prompt or choice step.

We did this because the capture block should end a “turn” in the conversation you’re designing.

After adding a capture step to your project, we can see that we now have the option of capturing the “Entire user reply”, or specific entities.

Entities are the same ones that exist in your interaction model and can be reused in intents, or referenced in output steps like speak or text. Defining entities enables artificial intelligence to extract the entity out of a sentence so your users experience inputting their information is more seamless and conversational.

For example, if we were to ask the user for their name, even if they say “hey my name is Joe”, the Capture step can still figure out that the name is “Joe” and disregard the other parts of the response.

Conversation Designers’ jobs just got a whole lot easier with these kinds of information capture questions. Leveraging the artificial intelligence in this new capture step, you no longer have to set up utterances with the entities you’re capturing information for, instead we’re now able to add utterances for each entity to help the machine learning model better identify different ways the user could say the entity.

Capturing Multiple Entities

When you’re having a human-to-human conversation, rarely will you ask one question and wait for an answer, rather you’ll ask for one or two pieces of information to help the other human find what you’re looking for - for example asking for name and email, or name and confirmation number, or email and tracking number, depending on the context of the question.

With this new capture step, multiple entities can be added per step and extract additional information.

Each entity can have a prompt attached to it, so if the user doesn’t fill all the entities, we can ask for each one individually before moving on with the flow, ensuring all necessary information is captured in the right place.

The capture step now has a “No Match” section. If the user says something completely unrelated to what we are trying to capture, we can handle those paths and provide a better experience when it’s not understood.

Technical Details

To understand the new step better, let’s examine some of the problems with the old step. Here’s what it looked like:

Out of the gate, we can see that to capture a “car” entity - we have to give all the examples and we’re declaring a {car} variable to capture input to within the step itself. You won’t see this car entity within your interaction model and need to redefine it in other capture steps.

What is really happening in the background when the model is trained is something like this:

We’re just creating a hidden intent with a single utterance, that only contains the entity as an example, and then further mapping that to another variable.

The biggest problem we face with natural language processing (NLP) is that it is always listening for every intent at every point in time. So even if the user is not currently on the capture step, it’s still possible to mistake their input for this capture intent. This leads to a lot of unpleasant experiences when entities are closely related (like a “music artist” vs “name” entity). This hidden capture intent pollutes the interaction model, causing many other intents to be misidentified.

Suppose we are capturing the user’s name in a choice step intent elsewhere. If my name just happened to be “Mercedes” it might think I’m still trying to capture a car even though I might be trying for the name intent. This will result in the choice step thinking it’s a no match.

The most important thing is context - letting the NLP know when to listen for certain intents and not others based on where the user is within the Voiceflow project.

What’s really happening in the new capture step is we are creating a hidden intent, but this time with no utterances, and only required entities. This way it is impossible for the intent to be detected from anywhere else in the project unless explicitly triggered.

Here’s an example of what the capture intent looks like:

You’ll notice that the entity section on the capture step is extremely similar to the “{entity} is required” section on intents.

Upon reaching the capture step we trigger this intent and its entity filling on behalf of the user.

One of the biggest challenges with the new capture step was ensuring this feature works across various channels and has a unified experience in Voiceflow.

For technical details on how this was achieved on various channels:

LUIS/Voiceflow: Statefulness with Microsoft LUIS

Alexa: Use Intent Chaining to Enter Dialog Management from a Different Intent

Google Actions: Transition Scenes

Dialogflow: Custom Events

If you’re interested in using the new capture step in action, create your free Voiceflow account here and start designing more human conversations. If you're a Voiceflow user who wants to get started with a template on using the capture step, use the free Capture template here!


The future of conversational AI