When raising money for Voiceflow in 2018, the most common question I heard from investors was “what is the killer voice app?”. I hadn’t heard the term “killer app” before, but it quickly became clear they were referring to the blockbuster billion dollar apps that now define the mobile app ecosystem — think the Angry Birds, Instagrams, and UBERs of the world.
Flash forward a year later and I’m still working toward finding a true answer, so I’ve gone ahead and written this post.
We’ve been living in the era of mobile for more than a decade now and we are quick to compare voice assistants in their infancy against mobile in its maturity. Platforms follow the curve of the innovator’s dilemma where today’s niche technologies, like voice, will become tomorrow’s dominant platforms more slowly than expected when measured in months, but faster when measured in years.
This post is a combination of a couple different ideas we’ve been thinking about at Voiceflow. It discusses:
- What “voice” and voice apps are
- Why we haven’t had a blockbuster voice app yet, and
- ultimately the reasoning behind why I think every app will be a voice app in the near future.
If you asked the average person on the street today what an “app” is they would more often than not point to one of the square icons on their phone. When asked what a “voice app” is, they’d likely guess and land somewhere around a mobile app with optional voice commands. There isn’t a good, or even remotely agreed upon definition of what a “voice app” is — but the average guess isn’t far off.
"A voice app is an application that utilizes voice as a primary input interface."
This definition is simple and encompasses voice apps across all platforms. The most important part of this definition is that it allows for apps that have touch screens, keyboards and other interfaces to still be “voice apps”. The only criteria that matters for an app to be a voice app is it having voice interaction capabilities that are a primary interface , not an afterthought, gimmick, or shortlist of commands. A simple test for whether an app is a voice app is if you could use the app successfully and wholly with only your voice. If so, it’s a voice app irregardless of whether users choose to even use their voice.
It’s become common for people to say voice is the next “platform shift”, meaning the next major way for people to interact with technology on a daily, if not hourly basis. The previous platform shift was desktop to mobile, and many now say mobile to voice. The problem with this is that “voice” is an interface shift, not a platform shift. Platforms in the sense of a platform shift are the foundational technologies that facilitate app makers to connect with app consumers, think smartphones, computers and your voice assistant like Alexa or Google Assistant. Interfaces are the technologies that allow us to interact with platforms, such as keyboards, touch screens and your voice. An interface shift has occurred when the way we expect to interact with mainstream technology changes. It’s unlikely today you’ll see a phone with a keyboard instead of a touchscreen or a computer with only a command line instead of graphics — these were interface shifts.
I believe Conversational AIs, like Alexa and Google Assistant, are the next platform shift while voice interfaces represent the next interface shift. When shifts happen the previous technology doesn’t go away but settles into use cases where it’s best suited with its unique abilities whilst the dominant platform or interface becomes the default. We still use keyboards and desktops alongside our touchscreens and mobile phones — it’s the technologies we rely on most and that capture our collective imagination that have shifted.
Voice will not replace mobile because voice is not a platform, it’s an interface. Mobile is likely to continue being the largest platform for voice interface use both in the present and near future. What will dethrone mobile however, are omni-channel voice assistants, but that’s a topic for another post.
Platforms, like most technologies, follow Clayton Christensen’s famous theory of “The Innovator’s Dilemma”. New platforms are created to solve a particular use case that is impossible with pre-existing platforms, such as mobile and personal computing. At launch these new platforms solve only for their niche use case and thus have far less functionality than their current mainstream versions. Because of this apparent lack in functionality compared to the mature dominant platforms of the day, they are quickly written off as being non-useful. Personal computers were written off as being only useful for computation, smartphones for mobile email, and voice assistants for questions, weather and music.
Over time, the dominant platform saturates in functionality and has diminishing performance returns for every new version (sounds a lot like mobile phones). Meanwhile, the new platform is able to continue adding functionality at rapid pace, expanding its capabilities and range of use cases. Soon, the new platform is at parity with the previous platform for many use cases, like commerce and entertainment, and then suddenly the new platform shifts into the foreground as dominant. Platforms rarely go away and it’s likely you still have a desktop as well as a mobile phone. However the platform that is top-of-mind for the public changes hands when the new platform and previous platforms fortify their hold on platform-specific “killer apps” — think desktop for photo editing, or mobile and Snapchat.
It’s too early to tell what the killer apps will be for voice because voice assistants are still adding functionality at a rapid pace. As features are added additional use cases for “killer apps” will be unlocked in a similar fashion to how GPS on mobile unlocked UBER, cameras unlocked Snapchat, or vector graphics on computers unlocked Adobe Illustrator.
We suffer from a strong recency bias when searching for killer apps on voice assistants against the backdrop of today’s mobile apps which have had a decade to mature. Voice assistants are still limited in functionality and have tremendous room to grow as “killer apps” are unlocked through future platform additions.
An interesting note is that many of the best early use cases for voice assistant apps are business-facing and thus have less visibility despite enormous impact. Many businesses are creating simple voice apps to perform database queries or run calculations where they’re benefitting from the speed of input and lack of physical touch. These aren’t the “Angry Birds” that defined an entire platform, but they’re making a tangible difference across a wide number of industries.
It’s hard to imagine exactly what the killer apps will be, and when and where we’ll find see them emerge. If it was easy, we’d all be rich or too late. We know from investor track records with previous platforms that no one can predict with perfect accuracy what the killer apps will be for a new platform every time. Indeed, predicting successful use cases is likely better served by the phrase “I’ll know it when I see it”. What we can do however is think through a simple framework to know where to look.
The best place to start looking for killer use cases within a new platform is to think about the limitations of previous platforms and explore what the new platform makes possible. It’s not enough for the new platform to make a previous platform’s task only slightly better. A killer use case for a new platform does what previous platforms could not do, or it does it at least 10x better (the rule of magnitude). In the case of voice assistants, what’s now possible with the current functionality is use cases where you can’t use your hands or have a line of sight — scenarios such as driving, cooking, or manual labor. Tasks that are 10x better with voice assistants are those which are highly repetitive where you know exactly what you want to do — such as asking a question, changing a song, or making a routine purchase like coffee for your morning commute.
There’s a popular, simple idea that there will be a similar successful startup category present for every platform — ie “we’re the X” for “platform”. I don’t believe this is true as every platform has unique challenges and opportunities, however it makes for easy startup pitches and is justified by hindsight bias and pattern recognition. When a voice app company pitches themselves as an existing mobile/web company “for voice”, it’s clear that voice does not make this idea possible as evident by the existence of the company they’re mimicking. With this in mind we have to think long and hard as to whether the service being a voice app delivers a true 10x user experience over the existing platforms. If there is a performance increase through adding a voice interface but it’s not an order of magnitude improvement, it’s likely the startup opportunity is small and the original company will acquire a new voice app or build it themselves eventually. If the startup doesn’t deliver a 10x improvement, they’ll have a hard time winning customers from other platforms and thus won’t be able to build a large company fast enough to fend off the original company when they do launch their voice interface.
The platform shift from static to mobile computing was enormous but it unlocked millions of use cases and opportunities. Mobile to voice assistants is less of an upfront dramatic shift because voice assistants, without additional functionality, already help users move technology from our hands to our pockets, whereas mobile moved technology from the office to anywhere. Voice assistants today are still a profound change, but it won’t unlock the same scale of opportunities as quickly as mobile computing did. Voice assistants will begin to have a more profound impact through the addition of further features, notably visuals, authentication, and true proactivity.
Today voice assistants are mostly reactive systems similar to mobile/desktop operating systems, with the only major difference being they’ve adopted a voice interface. Longer term, voice assistants have the potential and proper foundation to become the first proactive operating systems that will truly live up to their “assistant” name and deliver massive impact. These conversational AIs are cloud-based operating systems and can thus live on any device with a wifi connection to become the foundational platform we interact with on a daily basis through a proactive, truly helpful assistant. These omni-channel assistants will allow companies to carry consistent and contextually intelligent conversations with consumers across channels (SMS, Messenger, IVR, etc) as they go about their day. This future is far away, but closer than we think.
The opportunity for a voice app startup to compete with existing solutions today comes through great conversation design and leveraging the unique functionalities of being a voice app living on a voice assistant. Features such as non-linear app structures, being entirely cloud based, and the omnipresent nature of being accessible across millions of device types are all strengths that when combined with strong design can yield 10x improvements. An immediate example that comes to mind is DriveTime which is building the HQ Trivia of voice. Their team will be able to build-up voice specific content and great conversation design practices for millions of commuters whom cannot reasonably use a visual-only HQ Trivia on the go, thus a 10x improvement. By the time HQ Trivia or other mobile visual content producers become voice apps, they’ll be far behind DriveTime whom will have developed an experienced team with great conversation design.
An interface is a way to input and output data between user and computer. Each interface has different pros and cons — voice has a fast input speed, and slow output speed. Imagine for a second you’re choosing a Netflix movie with a visual interface. You would have to input using that visual keyboard everyone hates, but then receive a visual list of all the movie results. This experience had a slow input, and fast output. Conversely, the same experience with only voice would have you say the movie you want to search quickly, but then have Netflix list out verbally all the movie results — fast input, slow output. The slow output limitation of voice interfaces restricts use cases where voice interfaces are usable as giving users results of more than a sentence creates cognitive load where the user need remember their options.
Further, voice interfaces inhibit browsing which is key to enabling great user experiences. For example, you wouldn’t order an UBER through voice unless you knew exactly where you needed to go because voice interfaces have a built-in turn timer that cuts us off when we’re not speaking. If you had to check a map, ask a friend where to go, or any activity that takes more than a few seconds during a conversation— the interface times out.
If our app has both a screen and voice interface, we get the best of both worlds — a fast interface and fast output with a flexible conversation timer. The user can go as fast as they want and use only the voice interface, also allowing the user to not touch the device, or slow the conversation down and visually see their options. This kind of app interface is a “voice-first” interface where the first interaction is voice (thus “voice first”), and the second is visual. Many voice-first devices pair a touchscreen with a voice interface allowing for the optionality of a completely visual interface should the user not want to use voice, perhaps because they’re in a public setting.
As voice interfaces grow in prominence we’ll see the rise of multi-interface apps which are able to handle voice-only interactions, voice-first interactions, and visual-only interactions. These multi-interface apps will provide a superior user-experience that is flexible to the user’s situation and will become the new standard set for app development.
Designing multi-interface apps requires teams with both an understanding of traditional UI/UX development in addition to CxD (Conversation Design). Creating apps that are truly multi-interfaced will require a deep understanding of conversation design as the development of the app will have to begin with conversation design, and layer on visuals later. The current approach to voice app design is to design the visual app first and layer the voice interface on later. This approach will not work for multi-interface apps as visual app design is linear whereas conversations are not. To create a good multi-interface design, product teams will need to first design the non-linear conversation paths that could manage voice-only interaction, then layer in visuals for voice-first interaction, and finally create visual interaction paths for visual-only interaction.
In the near future, every app will become a voice app, and eventually a multi-interface app. As the world emerges from a global pandemic users will demand better user experiences that are flexible to how we want to interact with technology at any given moment/context. People won’t want to touch dirty elevator buttons, step out of the shower to change a song, or pull out their phone to order a coffee — in these cases and many more, it’s easier to talk than type.
We can’t easily predict today what the billion dollar voice apps will look like, or when they’ll come. All we know is that there are use cases today that voice makes possible, and that as the platforms grow in functionality so will the number of “killer” apps.
If you have any questions or would love to just talk about the future of voice – feel free to email me directly at: firstname.lastname@example.org
Alexa routines help you string together multiple skills or tasks - all using one command. Here's a guide on how to get started.
Voiceflow's new editors make it easier to create dynamic conversations through improved slot and intent management, prompts, and more.