Consumer-facing AI systems can carry out tasks or services for an individual based on commands or questions.
A credential that represents the end user (resource owner) in another system. A token should identify the user in the other system. The access token is included in the requests sent to your skill if the user has successfully linked their accounts. (Source: Alexa glossary)
An Alexa Skills Kit feature that lets you connect the identity of the end user with an account in another system. For example, a Car Hailer custom skill for ordering a ride needs to access the Car Hailer service as a specific user. Similarly, a smart home skill for controlling a light needs to connect the Alexa user with an account in the device cloud. (Source: Alexa glossary)
A JSON file that defines your Actions. This file includes information for the Actions directory listing, account linking information, a list of intents that the Actions can handle, and the actual fulfillment endpoints.
A phrase that opens a specific action when spoken to a Google Assistant. Also known as the "invocation phrase" or "implicit invocation intent". An example of this would be "Ok Google, open Trivial Pursuit".
A web tool for testing and debugging Actions in real-time. The simulator lets you test your Actions for all surfaces that the Google Assistant supports, without requiring a physical device.
Amazon has created a development tool that allows you to create, modify and delete skills. Coding required.
The Alexa Developer Console is a conversational platform that allows developers to build, test, distribute and certify Alexa Skills.
This is Amazon's voice-first design language that makes it easy to create visually-rich Alexa skills for millions of Alexa devices with screens. APL enables creators to build interactive voice experiences that include graphics, images, slideshows, and video and to customize them for different device types such as Echo Show, Fire TV and select Fire Tablet devices.
A set of actions or tasks that are accomplished by Alexa — Amazon's voice assistant. Skills are like apps for Alexa. They help customers perform everyday tasks or engage with content naturally through voice.
A collection of APIs, tools and documentation for giving Alexa new capabilities.
A service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. The text-to-speech service uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries. (Source: Alexa glossary)
This is a term used to describe a state where technology is omnipresent and accessible whenever required.
This is code that allows two software programs to communicate with each other.
Stands for "applications". Apps are pieces of software written for a specific platform that are meant to do a particular task. For example, on the iPhone platform, you could create a calculator "app" that utilizes the software and hardware in the iPhone.
An application with interactive voice response (IVR) systems that automatically answers, directs, and transfers incoming calls to an extension without the need of a phone operator/receptionist.
Computer technology that can identify and process human voice. It is mainly used to convert spoken words into computer text. ASR is also used for authenticating users via their voice and performing an action based on the instructions defined by the user. Typically, automatic speech recognition requires preconfigured or saved voices of the primary user(s). It is also known as Automatic Voice Recognition (AVR)
Programs that automate conversations on web or instant messenger
The total amount of mental effort being used in the working memory, or how difficult it is for a user to understand or parse the information being presented to them. (Source: Alexa glossary)
sure the customer knows she understood them correctly. Types of confirmation include Implicit confirmation and Explicit confirmation. (Source: Alexa glossary)
Conversation design, at its heart, is about teaching computers to communicate like humans, and not the other way around. It’s about making these experiences easy and intuitive, and reducing frustration. At a more practical level, it’s about designing experiences that include conversational interactions, whether that’s through a voice user interface, a voice-forward screen, or a multi-modal device like a mobile phone that may include typing, tapping and swiping.
Refers to the use of messaging apps, speech-based assistants (Amazon Alexa, Google Assistant etc.) and chatbots to automate communication, enhance machine learning which can in turn create personalized experiences at scale.
A conversational user interface are platforms that house artificial intelligence-supported voice apps, chatbots and IVRs to have verbal or written interactions with human users. The goal of CUIs? To mimic human conversation.
When something unexpected happened in the conversation between Alexa and the customer. Types of dialogue errors include low confidence errors, timeouts/silence/no input, and false accepts.
A design system that offers a more flexible way to design customer-centric voice experiences. This system involves writing more scripted dialogue between the voice assistant (Ex. Alexa) and the customer so that you can take those conversations and convert them into storyboards.
Dialogflow is a conversational platform that lets developers design and build Google Actions, chatbots, and conversational IVRs. Voiceflow allows your to import your projects to Dialogflow, where you can publish your Actions to Google Assistant. Unlike Voiceflow, coding is required.
The message delivered to a customer when an utterance or technical error occurs during a dialog. (Source: Alexa glossary)
When the customer says a command like exit or stop to end the interaction.
A prompt that repeats back what Alexa heard and explicitly asks the customer to confirm whether they were correct. For example, "Alexa, ask Astrology Daily for my horoscope". Alexa would respond with, "You wanted a horoscope from Astrology Daily, right?" (Source: Alexa glossary)
When Alexa has mid to high confidence that she correctly understood what the customer said, but she actually misunderstood.
Skills that have been built specifically for Amazon Alexa's 'Flash Briefing' feature, which provides users with news headlines and updates, event information, local weather reports and other forms of short-form content.
A service, app, feed, conversation, or other logic that handles an intent and carries out the corresponding Action.
A set of actions or tasks that are accomplished by Google's voice assistant.
A developer tool that lets you create, maintain, test and publish Actions.
A program interface that uses a computer's graphic capabilities to make it easier to use. GUIs make it possible for users to interact with electronic devices (computers, phones, gaming devices, etc.) through visuals like graphical icons. It is occasionally referred to as "gu-ee".
A happy path is a streamlined path of execution - like in a voice app for example - which features a default progression of events where no exceptional or error conditions arise. This is ideal when building the simplest flow of logic through a system or task. Where the "happy path" falls short is identifying and planning for unexpected inquiries that land outside of the default progression of the event or task.
The physical hardware portion of a platform, such as your physical iPhone. It is a shell that is useless without software giving it instructions for what to do.
A prompt that subtly repeats back what Alexa heard to give the customer assurance that they were correctly understood. In the following example, repeating back the word horoscope is a landmarking technique used to establish trust with the customer but still supports natural dialog. For example, "Alexa, ask Astrology Daily for my horoscope". Alexa would then ask to clarify the request with, "Horoscope for what sign?" (Source: Alexa glossary)
With in-skill purchasing (ISP) for Alexa skills, you can make money through your skills by selling digital products to customers.
When building Google actions, this refers to a feature that lets you assign different weights to intents for matching. If a user query can be matched to multiple intents, Dialogflow (Google's natural language understanding platform) is more likely to trigger an intent if it has a higher priority. (Source: Google design guidelines)
Tasks your assistant can do for you. Simply put, an intent is the user's intention in a given sentence or command. For example, if the user said "order me a large mocha coffee" the intent here would be to order coffee. An intent doesn't relate to the specific words "order" and "coffee" but rather the goal they are aiming for which is to order a coffee
Based upon the idea that a computer needs specific information to understand human language. The interaction model provides the necessary information for a computer to understand and process a given voice request or command. This incorporates the use of utterances, intents and slots which all map out a user's spoken input. (see these definitions for more info).
An automated phone system that provides pre-recorded voice responses that can interact with callers, gather information, provide information, and route calls to the appropriate recipients via voice or touchtones on a keypad device.
A device or program enabling a user to communicate with a computer.
The interconnection via the Internet of computing devices embedded in everyday objects, enabling them to send and receive data. Examples of objects that can fall into the scope of Internet of Things include connected security systems, thermostats, cars, electronic appliances, lights in household and commercial environments, alarm clocks, speaker systems, vending machines and more
When creating a custom Alexa skill, you will need to provide an invocation name that users will use to open your skill. For example, you might say "Alexa, play Game of Thrones Quiz". The invocation name here would be Game of Thrones Quiz.
When Alexa has low confidence that she correctly understood what the customer said. When this occurs, Alexa cannot proceed in the interaction without asking the question again or ending the interaction.
A prompt that asks the customer a question intended to elicit a response from a small set of possible options (recommended 5 or fewer). For example, "Hi Mark, you can now hear about the following: your chequing account balance, your savings account balance, or your credit card balance. Which would you like to hear?"
Combining voice, touch, text, images, graphics, audio and video in a single user interface. This enhances user interactions by providing information through both auditory and visual means. In a nutshell, it's both GUI and VUI together. Voice (audio) + Graphical Interface (visual). Example: Fire TV
Technology used to aid computers in understanding the human's natural language. NLP lets people and machines talk to each other “naturally”. An effective NLP system is able to ingest what is said to it, break it down, comprehend its meaning, determine appropriate action, and respond back in a language the user will understand.
NLU Can be thought of as a subfield of NLP. NLU more specifically deals with machine reading, or reading comprehension. NLU goes beyond the sentence structure and aims to understand the intended meaning of language. While humans are able to effortlessly handle mispronunciations, swapped words, contractions, colloquialisms, and other quirks, machines are less adept at handling unpredictable inputs. Enter NLU.
A prompt that asks the customer a question intended to elicit a wide range of responses. For example, "What would you like to do?"
A branch of machine learning that utilizes patterns and regularities in data to train systems.
A group of technologies that are used as a base upon which other applications, processes or technologies are developed. In personal computing, a platform is the basic hardware (computer) and software (operating system) on which software applications can be run.
A special kind of prompt used by Alexa when a response is not heard or clearly understandable, usually in the form of a question after a dialog error has occurred. The general purpose of a re-prompt is to help the customer recover from errors.
Text that is transmitted in real-time on a device as the users speaks
A slot that contains values that are necessary for Alexa to complete the user's request. For example, Alexa, ask Astrology Daily for the horoscope for Taurus. Without the name of the specific zodiac sign, Astrology Daily cannot provide a horoscope. If the user does not provide a value for a required slot, you must ask the user for that slot value. (Source: Alexa Glossary)
In many scenarios, intents alone are not enough to fulfill a request. This is where "slots" come into play. Slots can be thought of as particular pieces of information that you have told the assistant to look for when the user is giving their response. In the utterance "order me a large mocha coffee," we want our assistant to look out for the coffee size and type. These are the slots we are looking to capture from the users utterance. In the above utterance, we would assign a size slot to capture 'large' and a type slot to capture 'mocha.'
A smart city is an urban area that uses different types of electronic Internet of things (IoT) sensors to collect data and then use insights gained from that data to manage assets, resources and services efficiently. This includes data collected from citizens, devices, and assets that is processed and analyzed to monitor and manage traffic and transportation systems, power plants, utilities, water supply networks, waste management, crime detection, information systems, schools, libraries, hospitals, and other community services.
A smart home or smart house is the use of devices in the home that connect via a network, most commonly a local LAN or the internet. It uses devices such as sensors and other appliances connected to the Internet of things (IoT) that can be remotely monitored, controlled or accessed and provide services that respond to the perceived needs of the users.
Skills that have been built specifically for controlling smart home appliances.
A smart speaker is a type of speaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation
the code that runs the hardware and makes it useful. Without software, hardware wouldn't have the logic or programs in place to actually do anything. Without hardware to run, software is useless.
An SDK is a collection of software development tools in one installable package. They make it easier for developers to create apps by packaging the necessary tools needed. For example, if you were to build a house, an SDK would include a toolbox specifically for constructing the kitchen. You could still use other tools, or even build your own, but an SDK offers something specific to solving problems or a theme of problems within that area.
The ability of an electronic device to recognize spoken words only and not the individual voice characteristics of the user
Easy-to-use visual editor to improve the speech output of voice applications (like Alexa or Google Assistant). In simple terms, SSML can help Alexa or Google sound more natural. For example, you can add longer breaks between sentences or even emphasize a certain word.
The system persona is the conversational partner created to be the front end of the technology that the user will interact with directly. Defining a clear system persona is vital to ensuring a consistent user experience. Otherwise, each designer will follow their own personal conversational style and the overall experience will feel disjointed. (Source: Google design guidelines)
Converting human language into artificially produced speech using specialized software. It is also referred to as "read aloud" technology. It works in nearly every personal digital device nowadays, including smartphones, computers and tablets.
These are paths that users follow through an experience. Flows aren't necessarily linear, and can branch out in different paths.
A user persona is a specific, but brief, description of the type of user who will interact with your voice app. Think of a few people you expect to use your skill or action. Try to have 2-3 different types, e.g., a millenial vs a working parent. These user personas will help you avoid designing only for yourself and your goals. This ultimately helps you create authentic dialogs and more engaging experiences for customers. (Source: Google design guidelines)
This can be anything the user says. For example, if the user said "order me a large mocha coffee", the entire sentence would be the utterance.
On an Alexa-enabled device with a screen or a display, the viewport is the area of the display that the user can see.
A recorded message that is played by interactive voice response (IVR) systems, message-on-hold systems and other voice processing tools. The goal of the prompt is to guide the user towards their destination — like if they want to see the funds in their savings account or find out the amount they owe on their last credit card statement.
The ability of an electronic security device to recognize the voice of a particular person.
A VUI allows users to interact with a system through voice or speech commands using speech recognition technology (Amazon Alexa, Google Assistant, Siri, Cortana, etc.). It is occasionally referred to as "v-ew-ee".
Programs that automate conversations on phones or voice assistants
A special word or phrase that is meant to activate a given device once said. An example of these words or phrases would be "Alexa", "Hey Siri" and "Hey Google". These are also called "trigger words".
Wearable technology or wearables are smart electronic devices (electronic device with micro-controllers) that are worn close to and/or on the surface of the skin, where they detect, analyze, and transmit information concerning e.g. body signals such as vital signs, and/or ambient data and which allow in some cases immediate biofeedback to the wearer.
Join over 5,500 creators building with Voiceflow. Get early access to features, community exclusive perks, and a direct line to our pro users and team.
Looking to start off right? Check out our series of videos made by Voiceflow and our community on our channel.
Whether you're new to voice or turning into an expert, the Voiceflow Learning hub has a series of in-depth walkthroughs of our features to explore.