Voice Glossary

A collection of popular terms for voice and conversational design.
Can't find what you're looking for? Check out our resources.

AI Assistants / Virtual Assistants

Consumer-facing AI systems can carry out tasks or services for an individual based on commands or questions.

Access Token

A credential that represents the end user (resource owner) in another system. A token should identify the user in the other system. The access token is included in the requests sent to your skill if the user has successfully linked their accounts. (Source: Alexa glossary)

Account Linking

An Alexa Skills Kit feature that lets you connect the identity of the end user with an account in another system. For example, a Car Hailer custom skill for ordering a ride needs to access the Car Hailer service as a specific user. Similarly, a smart home skill for controlling a light needs to connect the Alexa user with an account in the device cloud. (Source: Alexa glossary)

Action package

A JSON file that defines your Actions. This file includes information for the Actions directory listing, account linking information, a list of intents that the Actions can handle, and the actual fulfillment endpoints.

Action phrase

A phrase that opens a specific action when spoken to a Google Assistant. Also known as the "invocation phrase" or "implicit invocation intent". An example of this would be "Ok Google, open Trivial Pursuit".

Actions Simulator

A web tool for testing and debugging Actions in real-time. The simulator lets you test your Actions for all surfaces that the Google Assistant supports, without requiring a physical device.

Alexa Developer Console

Amazon has created a development tool that allows you to create, modify and delete skills. Coding required.

Alexa Developer Console (ADC)

The Alexa Developer Console is a conversational platform that allows developers to build, test, distribute and certify Alexa Skills.

Alexa Presentation Language (APL)

This is Amazon's voice-first design language that makes it easy to create visually-rich Alexa skills for millions of Alexa devices with screens. APL enables creators to build interactive voice experiences that include graphics, images, slideshows, and video and to customize them for different device types such as Echo Show, Fire TV and select Fire Tablet devices.

Alexa Skill

A set of actions or tasks that are accomplished by Alexa — Amazon's voice assistant. Skills are like apps for Alexa. They help customers perform everyday tasks or engage with content naturally through voice.

Alexa Skills Kit

A collection of APIs, tools and documentation for giving Alexa new capabilities.

Amazon Connect

Amazon Connect is an omnichannel cloud contact center service that provides a seamless experience across voice and chat for your customers and agents. This includes one set of tools for skills-based routing, powerful real-time and historical analytics, and easy-to-use intuitive management tools.

Amazon Polly

A service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. The text-to-speech service uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries. (Source: Alexa glossary)

Ambient Computing

This is a term used to describe a state where technology is omnipresent and accessible whenever required.

Application Program Interface (API)

This is code that allows two software programs to communicate with each other.


Stands for "applications". Apps are pieces of software written for a specific platform that are meant to do a particular task. For example, on the iPhone platform, you could create a calculator "app" that utilizes the software and hardware in the iPhone.

Automated Attendant (Digital Receptionist)

An application with interactive voice response (IVR) systems that automatically answers, directs, and transfers incoming calls to an extension without the need of a phone operator/receptionist.

Automated Speech Recognition (ASR)

Computer technology that can identify and process human voice. It is mainly used to convert spoken words into computer text. ASR is also used for authenticating users via their voice and performing an action based on the instructions defined by the user. Typically, automatic speech recognition requires preconfigured or saved voices of the primary user(s). It is also known as Automatic Voice Recognition (AVR)

Caller Intent

Caller intent identifies the reason for each phone call, thereby uncovering opportunities to personalize customer engagement, and to predict customer actions.


Programs that automate conversations on web or instant messenger

Cognitive Load

The total amount of mental effort being used in the working memory, or how difficult it is for a user to understand or parse the information being presented to them. (Source: Alexa glossary)


When Alexa says something to make sure the customer knows she understood them correctly. Types of confirmation include Implicit confirmation and Explicit confirmation. (Source: Alexa glossary)

Conversation Design

Conversation design, at its heart, is about teaching computers to communicate like humans, and not the other way around. It’s about making these experiences easy and intuitive, and reducing frustration. At a more practical level, it’s about designing experiences that include conversational interactions, whether that’s through a voice user interface, a voice-forward screen, or a multi-modal device like a mobile phone that may include typing, tapping and swiping.

Conversation Designer

Also known as VUI designers or conversational user interface designers, these individuals are responsible for designing and building out voice user interfaces and making conversations between humans and computers as seamless as possible. The discipline as a whole is made up of several design disciplines including voice user interface design, interaction design, visual design, motion design, audio design and copywriting. The goal of the conversation designer is like that of an architect, mapping out what users can do in a space, while considering both the user’s needs and the technological constraints. They curate the conversation, defining the flow and its underlying logic in a detailed design specification that represents the complete user experience. They partner with stakeholders and developers to iterate on the designs and bring the experience to life (source: Actions on Google) 

Conversational Artificial Intelligence (CAI)

Refers to the use of messaging apps, speech-based assistants (Amazon Alexa, Google Assistant etc.) and chatbots to automate communication, enhance machine learning which can in turn create personalized experiences at scale.

Conversational User Interface (CUI)

A conversational user interface are platforms that house artificial intelligence-supported voice apps, chatbots and IVRs to have verbal or written interactions with human users. The goal of CUIs? To mimic human conversation.

Dialog Errors

When something unexpected happened in the conversation between Alexa and the customer. Types of dialogue errors include low confidence errors, timeouts/silence/no input, and false accepts.

Dialog Management

A design system that offers a more flexible way to design customer-centric voice experiences. This system involves writing more scripted dialogue between the voice assistant (Ex. Alexa) and the customer so that you can take those conversations and convert them into storyboards.


Dialogflow is a conversational platform that lets developers design and build Google Actions, chatbots, and conversational IVRs. Voiceflow allows your to import your projects to Dialogflow, where you can publish your Actions to Google Assistant. Unlike Voiceflow, coding is required.

Error Message

The message delivered to a customer when an utterance or technical error occurs during a dialog. (Source: Alexa glossary)

Exit Command

When the customer says a command like exit or stop to end the interaction.

Explicit Confirmation

A prompt that repeats back what Alexa heard and explicitly asks the customer to confirm whether they were correct. For example, "Alexa, ask Astrology Daily for my horoscope". Alexa would respond with, "You wanted a horoscope from Astrology Daily, right?" (Source: Alexa glossary)

False Accept Errors

When Alexa has mid to high confidence that she correctly understood what the customer said, but she actually misunderstood.

Flash Briefing Skill

Skills that have been built specifically for Amazon Alexa's 'Flash Briefing' feature, which provides users with news headlines and updates, event information, local weather reports and other forms of short-form content.


A service, app, feed, conversation, or other logic that handles an intent and carries out the corresponding Action.

Google Action

A set of actions or tasks that are accomplished by Google's voice assistant.

Google Actions Console

A developer tool that lets you create, maintain, test and publish Actions.

Graphical User Interface (GUI)

A program interface that uses a computer's graphic capabilities to make it easier to use. GUIs make it possible for users to interact with electronic devices (computers, phones, gaming devices, etc.) through visuals like graphical icons. It is occasionally referred to as "gu-ee".

Happy Path

A happy path is a streamlined path of execution - like in a voice app for example - which features a default progression of events where no exceptional or error conditions arise. This is ideal when building the simplest flow of logic through a system or task. Where the "happy path" falls short is identifying and planning for unexpected inquiries that land outside of the default progression of the event or task.


The physical hardware portion of a platform, such as your physical iPhone. It is a shell that is useless without software giving it instructions for what to do.

Implicit Confirmation

A prompt that subtly repeats back what Alexa heard to give the customer assurance that they were correctly understood. In the following example, repeating back the word horoscope is a landmarking technique used to establish trust with the customer but still supports natural dialog. For example, "Alexa, ask Astrology Daily for my horoscope". Alexa would then ask to clarify the request with, "Horoscope for what sign?" (Source: Alexa glossary)

In-Skill Purchase (ISP)

With in-skill purchasing (ISP) for Alexa skills, you can make money through your skills by selling digital products to customers.

Intent Priority

When building Google actions, this refers to a feature that lets you assign different weights to intents for matching. If a user query can be matched to multiple intents, Dialogflow (Google's natural language understanding platform) is more likely to trigger an intent if it has a higher priority. (Source: Google design guidelines)


Tasks your assistant can do for you. Simply put, an intent is the user's intention in a given sentence or command. For example, if the user said "order me a large mocha coffee" the intent here would be to order coffee. An intent doesn't relate to the specific words "order" and "coffee" but rather the goal they are aiming for which is to order a coffee

Interaction Model

Based upon the idea that a computer needs specific information to understand human language. The interaction model provides the necessary information for a computer to understand and process a given voice request or command. This incorporates the use of utterances, intents and slots which all map out a user's spoken input. (see these definitions for more info).

Interactive Voice Response (IVR)

An automated phone system that provides pre-recorded voice responses that can interact with callers, gather information, provide information, and route calls to the appropriate recipients via voice or touchtones on a keypad device.


A device or program enabling a user to communicate with a computer.

Internet of Things (IoT)

The interconnection via the Internet of computing devices embedded in everyday objects, enabling them to send and receive data. Examples of objects that can fall into the scope of Internet of Things include connected security systems, thermostats, cars, electronic appliances, lights in household and commercial environments, alarm clocks, speaker systems, vending machines and more.


When creating a custom Alexa skill, you will need to provide an invocation name that users will say to open your skill. For example, you might say, "Alexa, play Game of Thrones Quiz". The invocation name here would be "Game of Thrones Quiz".

JavaScript Object Notation (JSON)

A JSON is a text-based data format which is inspired by Javascript. It is a type of 'code'  used to transmit data between a server and a web application. In Voiceflow, JSON lets you transfer data from your google sheets to your project.

Low Confidence Errors

When Alexa has low confidence that she correctly understood what the customer said. When this occurs, Alexa cannot proceed in the interaction without asking the question again or ending the interaction.

Menu Style Prompt

A prompt that asks the customer a question intended to elicit a response from a small set of possible options (recommended 5 or fewer). For example, "Hi Mark, you can now hear about the following: your chequing account balance, your savings account balance, or your credit card balance. Which would you like to hear?"

Multimodal Experience

Combining voice, touch, text, images, graphics, audio and video in a single user interface. This enhances user interactions by providing information through both auditory and visual means. In a nutshell, it's both GUI and VUI together. Voice (audio) + Graphical Interface (visual). Example: Fire TV

Natural Language Processing (NLP)

Technology used to aid computers in understanding the human's natural language. NLP lets people and machines talk to each other “naturally”. An effective NLP system is able to ingest what is said to it, break it down, comprehend its meaning, determine appropriate action, and respond back in a language the user will understand.

Natural Language Understanding (NLU)

NLU Can be thought of as a subfield of NLP. NLU more specifically deals with machine reading, or reading comprehension. NLU goes beyond the sentence structure and aims to understand the intended meaning of language. While humans are able to effortlessly handle mispronunciations, swapped words, contractions, colloquialisms, and other quirks, machines are less adept at handling unpredictable inputs. Enter NLU.

Open Ended Prompt

A prompt that asks the customer a question intended to elicit a wide range of responses. For example, "What would you like to do?"

Pattern Recognition

A branch of machine learning that utilizes patterns and regularities in data to train systems.


A group of technologies that are used as a base upon which other applications, processes or technologies are developed. In personal computing, a platform is the basic hardware (computer) and software (operating system) on which software applications can be run.


A special kind of prompt used by Alexa when a response is not heard or clearly understandable, usually in the form of a question after a dialog error has occurred. The general purpose of a re-prompt is to help the customer recover from errors.

Real-time Text (RTT)

Text that is transmitted in real-time on a device as the users speaks

Required Slot

A slot that contains values that are necessary for Alexa to complete the user's request. For example, Alexa, ask Astrology Daily for the horoscope for Taurus. Without the name of the specific zodiac sign, Astrology Daily cannot provide a horoscope. If the user does not provide a value for a required slot, you must ask the user for that slot value. (Source: Alexa Glossary)


In many scenarios, intents alone are not enough to fulfill a request. This is where "slots" come into play. Slots can be thought of as particular pieces of information that you have told the assistant to look for when the user is giving their response. In the utterance "order me a large mocha coffee," we want our assistant to look out for the coffee size and type. These are the slots we are looking to capture from the users utterance. In the above utterance, we would assign a size slot to capture 'large' and a type slot to capture 'mocha.'

Smart Cities

A smart city is an urban area that uses different types of electronic Internet of things (IoT) sensors to collect data and then use insights gained from that data to manage assets, resources and services efficiently. This includes data collected from citizens, devices, and assets that is processed and analyzed to monitor and manage traffic and transportation systems, power plants, utilities, water supply networks, waste management, crime detection, information systems, schools, libraries, hospitals, and other community services.

Smart Home

A smart home or smart house is the use of devices in the home that connect via a network, most commonly a local LAN or the internet. It uses devices such as sensors and other appliances connected to the Internet of things (IoT) that can be remotely monitored, controlled or accessed and provide services that respond to the perceived needs of the users.

Smart Home Skill

Skills that have been built specifically for controlling smart home appliances.

Smart Speaker

A smart speaker is a type of speaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation


the code that runs the hardware and makes it useful. Without software, hardware wouldn't have the logic or programs in place to actually do anything. Without hardware to run, software is useless.

Software Development Kit (SDK)

An SDK is a collection of software development tools in one installable package. They make it easier for developers to create apps by packaging the necessary tools needed. For example, if you were to build a house, an SDK would include a toolbox specifically for constructing the kitchen. You could still use other tools, or even build your own, but an SDK offers something specific to solving problems or a theme of problems within that area.

Speech Recognition

The ability of an electronic device to recognize spoken words only and not the individual voice characteristics of the user

Speech Synthesis Markup Language (SSML)

Easy-to-use visual editor to improve the speech output of voice applications (like Alexa or Google Assistant). In simple terms, SSML can help Alexa or Google sound more natural. For example, you can add longer breaks between sentences or even emphasize a certain word.

System Persona

The system persona is the conversational partner created to be the front end of the technology that the user will interact with directly. Defining a clear system persona is vital to ensuring a consistent user experience. Otherwise, each designer will follow their own personal conversational style and the overall experience will feel disjointed. (Source: Google design guidelines)

Text-to-Speech (TTS)

Converting human language into artificially produced speech using specialized software. It is also referred to as "read aloud" technology. It works in nearly every personal digital device nowadays, including smartphones, computers and tablets.

User Flow

These are paths that users follow through an experience. Flows aren't necessarily linear, and can branch out in different paths.

User Persona

A user persona is a specific, but brief, description of the type of user who will interact with your voice app. Think of a few people you expect to use your skill or action. Try to have 2-3 different types, e.g., a millenial vs a working parent. These user personas will help you avoid designing only for yourself and your goals. This ultimately helps you create authentic dialogs and more engaging experiences for customers. (Source: Google design guidelines)


This can be anything the user says. For example, if the user said "order me a large mocha coffee", the entire sentence would be the utterance.

VUI Designer

Also known as VUI designers or conversational user interface designers, these individuals are responsible for designing and building out voice user interfaces and making conversations between humans and computers as seamless as possible. The discipline as a whole is made up of several design disciplines including voice user interface design, interaction design, visual design, motion design, audio design and copywriting. The goal of the conversation designer is like that of an architect, mapping out what users can do in a space, while considering both the user’s needs and the technological constraints. They curate the conversation, defining the flow and its underlying logic in a detailed design specification that represents the complete user experience. They partner with stakeholders and developers to iterate on the designs and bring the experience to life (source: Actions on Google) 


On an Alexa-enabled device with a screen or a display, the viewport is the area of the display that the user can see.

Voice Assistant

a voice-activated piece of software that can supply information and perform certain types of tasks.

Voice Design

Voice design is the process of designing the possible interactions that may occur between a voice assistant and an end-user. Good voice design achieves a conversational flow that makes interactions with voice assistants feel natural.

Voice Prompt

A recorded message that is played by interactive voice response (IVR) systems, message-on-hold systems and other voice processing tools. The goal of the prompt is to guide the user towards their destination — like if they want to see the funds in their savings account or find out the amount they owe on their last credit card statement.

Voice Recognition

The ability of an electronic security device to recognize the voice of a particular person.

Voice User Interface (VUI)

A VUI allows users to interact with a system through voice or speech commands using speech recognition technology (Amazon Alexa, Google Assistant, Siri, Cortana, etc.). It is occasionally referred to as "v-ew-ee".

Voice-first design

Voice-first design refers to a system which is designed in so that users use their voice as the primary mode of interaction. Examples of voice-first design include operating systems, smart appliances and voice assistants like Amazon Echo and Google Nest.


Programs that automate conversations on phones or voice assistants

WOz Testing

Wizard of Oz (WOz) testing occurs when the thing being tested does not yet actually exist, and a human is “behind the curtain” to give the illusion of a fully working system. - Cathy Pearl, Designing Voice User Interfaces (2016)

Wake Word

A special word or phrase that is meant to activate a given device once said. An example of these words or phrases would be "Alexa", "Hey Siri" and "Hey Google". These are also called "trigger words".

Wearable Technology

Wearable technology or wearables are smart electronic devices (electronic device with micro-controllers) that are worn close to and/or on the surface of the skin, where they detect, analyze, and transmit information concerning e.g. body signals such as vital signs, and/or ambient data and which allow in some cases immediate biofeedback to the wearer.

Getting started with Voiceflow

Facebook Logo

Facebook Community

Join over 7,800 creators building with Voiceflow. Get early access to features, community exclusive perks, and a direct line to our pro users and team.

Join Community
YouTube Logo

Youtube Tutorials

Looking to start off right? Check out our series of videos made by Voiceflow and our community on our channel.

Watch Videos
Voiceflow White Single Logo

Learning Hub

Whether you're new to voice or turning into an expert, the Voiceflow Learning hub has a series of in-depth walkthroughs of our features to explore.

Start Learning
BUILD & Publish

Powering 250M conversations a year on Alexa and Google Assistant

Build once, and launch on both Amazon Alexa and Google Assistant - powered by our AWS hosted infrastructure.
Get Started