Voice Glossary

A collection of popular terms for voice and conversational design.
Can't find what you're looking for? Check out our resources.

AI Assistants / Virtual Assistants

Consumer-facing AI systems can carry out tasks or services for an individual based on commands or questions.

Action package

A JSON file that defines your Actions. This file includes information for the Actions directory listing, account linking information, a list of intents that the Actions can handle, and the actual fulfillment endpoints.

Action phrase

A phrase that opens a specific action when spoken to a Google Assistant. Also known as the "invocation phrase" or "implicit invocation intent". An example of this would be "Ok Google, open Trivial Pursuit".

Actions Simulator

A web tool for testing and debugging Actions in real-time. The simulator lets you test your Actions for all surfaces that the Google Assistant supports, without requiring a physical device.

Alexa Developer Console

Amazon has created a development tool that allows you to create, modify and delete skills. Coding required.

Alexa Developer Console (ADC)

The Alexa Developer Console is a conversational platform that allows developers to build, test, distribute and certify Alexa Skills.

Alexa Presentation Language (APL)

This is Amazon's voice-first design language that makes it easy to create visually-rich Alexa skills for millions of Alexa devices with screens. APL enables creators to build interactive voice experiences that include graphics, images, slideshows, and video and to customize them for different device types such as Echo Show, Fire TV and select Fire Tablet devices.

Alexa Skill

A set of actions or tasks that are accomplished by Alexa — Amazon's voice assistant. Skills are like apps for Alexa. They help customers perform everyday tasks or engage with content naturally through voice.

Alexa Skills Kit

A collection of APIs, tools and documentation for giving Alexa new capabilities.

Ambient Computing

This is a term used to describe a state where technology is omnipresent and accessible whenever required.

Application Program Interface (API)

This is code that allows two software programs to communicate with each other.


Stands for "applications". Apps are pieces of software written for a specific platform that are meant to do a particular task. For example, on the iPhone platform, you could create a calculator "app" that utilizes the software and hardware in the iPhone.

Automated Attendant (Digital Receptionist)

An application with interactive voice response (IVR) systems that automatically answers, directs, and transfers incoming calls to an extension without the need of a phone operator/receptionist.

Automated Speech Recognition (ASR)

Computer technology that can identify and process human voice. It is mainly used to convert spoken words into computer text. ASR is also used for authenticating users via their voice and performing an action based on the instructions defined by the user. Typically, automatic speech recognition requires preconfigured or saved voices of the primary user(s). It is also known as Automatic Voice Recognition (AVR)


Programs that automate conversations on web or instant messenger

Conversational Artificial Intelligence (CAI)

Refers to the use of messaging apps, speech-based assistants (Amazon Alexa, Google Assistant etc.) and chatbots to automate communication, enhance machine learning which can in turn create personalized experiences at scale.

Conversational User Interface (CUI)

A conversational user interface are platforms that house artificial intelligence-supported voice apps, chatbots and IVRs to have verbal or written interactions with human users. The goal of CUIs? To mimic human conversation.

Dialog Errors

When something unexpected happened in the conversation between Alexa and the customer. Types of dialogue errors include low confidence errors, timeouts/silence/no input, and false accepts.

Dialog Management

A design system that offers a more flexible way to design customer-centric voice experiences. This system involves writing more scripted dialogue between the voice assistant (Ex. Alexa) and the customer so that you can take those conversations and convert them into storyboards.


Dialogflow is a conversational platform that lets developers design and build Google Actions, chatbots, and conversational IVRs. Voiceflow allows your to import your projects to Dialogflow, where you can publish your Actions to Google Assistant. Unlike Voiceflow, coding is required.

Exit Command

When the customer says a command like exit or stop to end the interaction.

False Accept Errors

When Alexa has mid to high confidence that she correctly understood what the customer said, but she actually misunderstood.

Flash Briefing Skill

Skills that have been built specifically for Amazon Alexa's 'Flash Briefing' feature, which provides users with news headlines and updates, event information, local weather reports and other forms of short-form content.


A service, app, feed, conversation, or other logic that handles an intent and carries out the corresponding Action.

Google Action

A set of actions or tasks that are accomplished by Google's voice assistant.

Google Actions Console

A developer tool that lets you create, maintain, test and publish Actions.

Graphics User Interface (GUI)

A program interface that uses a computer's graphic capabilities to make it easier to use. GUIs make it possible for users to interact with electronic devices (computers, phones, gaming devices, etc.) through visuals like graphical icons. It is occasionally referred to as "gu-ee".

Happy Path

A happy path is a streamlined path of execution - like in a voice app for example - which features a default progression of events where no exceptional or error conditions arise. This is ideal when building the simplest flow of logic through a system or task. Where the "happy path" falls short is identifying and planning for unexpected inquiries that land outside of the default progression of the event or task.


The physical hardware portion of a platform, such as your physical iPhone. It is a shell that is useless without software giving it instructions for what to do.

In-Skill Purchase (ISP)

With in-skill purchasing (ISP) for Alexa skills, you can make money through your skills by selling digital products to customers.


Tasks your assistant can do for you. Simply put, an intent is the user's intention in a given sentence or command. For example, if the user said "Order me a large mocha coffee", the words "order" and "coffee" would be classified as intents.

Interaction Model

Based upon the idea that a computer needs specific information to understand human language. The interaction model provides the necessary information for a computer to understand and process a given voice request or command. This incorporates the use of utterances, intents and slots which all map out a user's spoken input. (see these definitions for more info).

Interactive Voice Response (IVR)

An automated phone system that provides pre-recorded voice responses that can interact with callers, gather information, provide information, and route calls to the appropriate recipients via voice or touchtones on a keypad device.


A device or program enabling a user to communicate with a computer.


When creating a custom Alexa skill, you will need to provide an invocation name that users will use to open your skill. For example, you might say "Alexa, play Game of Thrones Quiz". The invocation name here would be Game of Thrones Quiz.

JavaScript Object Notation (JSON)

JSON is a text-based data format which is inspired by Javascript. It is a type of 'code' used to transmit data between a server and a web application. In Voiceflow, JSON lets you transfer data from your google sheets to your project.

Low Confidence Errors

When Alexa has low confidence that she correctly understood what the customer said. When this occurs, Alexa cannot proceed in the interaction without asking the question again or ending the interaction.

Menu Style Prompt

A prompt that asks the customer a question intended to elicit a response from a small set of possible options (recommended 5 or fewer). For example, "Hi Mark, you can now hear about the following: your chequing account balance, your savings account balance, or your credit card balance. Which would you like to hear?"

Multimodal Experience

Combining voice, touch, text, images, graphics, audio and video in a single user interface. This enhances user interactions by providing information through both auditory and visual means. In a nutshell, it's both GUI and VUI together. Voice (audio) + Graphical Interface (visual). Example: Fire TV

Natural Language Processing (NLP)

Technology used to aid computers in understanding the human's natural language. NLP lets people and machines talk to each other “naturally”. An effective NLP system is able to ingest what is said to it, break it down, comprehend its meaning, determine appropriate action, and respond back in a language the user will understand.

Natural Language Understanding (NLU)

NLU Can be thought of as a subfield of NLP. NLU more specifically deals with machine reading, or reading comprehension. NLU goes beyond the sentence structure and aims to understand the intended meaning of language. While humans are able to effortlessly handle mispronunciations, swapped words, contractions, colloquialisms, and other quirks, machines are less adept at handling unpredictable inputs. Enter NLU.

Open Ended Prompt

A prompt that asks the customer a question intended to elicit a wide range of responses. For example, "What would you like to do?"

Pattern Recognition

A branch of machine learning that utilizes patterns and regularities in data to train systems.


A group of technologies that are used as a base upon which other applications, processes or technologies are developed. In personal computing, a platform is the basic hardware (computer) and software (operating system) on which software applications can be run.

Real-time Text (RTT)

Text that is transmitted in real-time on a device as the users speaks


In many scenarios, intents alone are not enough to fulfill a request. This is where "slots" come into play. Slots act like traditional form fields in the sense that they can be optional or required depending on what's needed to complete the request. They are variables that relate back to the intent. For example, in the sentence "order me a large mocha coffee", the words "large" and "mocha" would be classified as slots or necessary options that are needed to fulfill the ask from the users.

Smart Home Skill

Skills that have been built specifically for controlling smart home appliances.


the code that runs the hardware and makes it useful. Without software, hardware wouldn't have the logic or programs in place to actually do anything. Without hardware to run, software is useless.

Software Development Kit (SDK)

An SDK is a collection of software development tools in one installable package. They make it easier for developers to create apps by packaging the necessary tools needed. For example, if you were to build a house, an SDK would include a toolbox specifically for constructing the kitchen. You could still use other tools, or even build your own, but an SDK offers something specific to solving problems or a theme of problems within that area.

Speech Recognition

The ability of an electronic device to recognize spoken words only and not the individual voice characteristics of the user

Speech Synthesis Markup Language (SSML)

Easy-to-use visual editor to improve the speech output of voice applications (like Alexa or Google Assistant). In simple terms, SSML can help Alexa or Google sound more natural. For example, you can add longer breaks between sentences or even emphasize a certain word.

Text-to-Speech (TTS)

Converting human language into artificially produced speech using specialized software. It is also referred to as "read aloud" technology. It works in nearly every personal digital device nowadays, including smartphones, computers and tablets.

User Flow

These are paths that users follow through an experience. Flows aren't necessarily linear, and can branch out in different paths.


This can be anything the user says. For example, if the user said "order me a large mocha coffee", the entire sentence would be the utterance.


On an Alexa-enabled device with a screen or a display, the viewport is the area of the display that the user can see.

Voice Prompt

A recorded message that is played by interactive voice response (IVR) systems, message-on-hold systems and other voice processing tools. The goal of the prompt is to guide the user towards their destination — like if they want to see the funds in their savings account or find out the amount they owe on their last credit card statement.

Voice Recognition

The ability of an electronic security device to recognize the voice of a particular person.

Voice User Interface (VUI)

A VUI allows users to interact with a system through voice or speech commands using speech recognition technology (Amazon Alexa, Google Assistant, Siri, Cortana, etc.). It is occasionally referred to as "v-ew-ee".


Programs that automate conversations on phones or voice assistants

Wake Word

A special word or phrase that is meant to activate a given device once said. An example of these words or phrases would be "Alexa", "Hey Siri" and "Hey Google". These are also called "trigger words".

Getting started with Voiceflow

Facebook Community

Join over 5,500 creators building with Voiceflow. Get early access to features, community exclusive perks, and a direct line to our pro users and team.

Join Community

Youtube Tutorials

Looking to start off right? Check out our series of videos made by Voiceflow and our community on our channel.

Watch Videos

Learning Hub

Whether you're new to voice or turning into an expert, the Voiceflow Learning hub has a series of in-depth walkthroughs of our features to explore.

Start Learning