1. Trained vs natural interface
Mobile interfaces are set by designers on behalf of machines and learned by humans. The first time you pick up a smartphone you learn how to navigate its interface and experience. Because of this the best mobile User Interfaces (UI) are uniform in general structure but add unique twists where they can. If your UI is too far from the norm, then people have a hard time learning how to use it. This leads to many mobile apps to follow the conventions and norms set out by the industry. Every now and then a new generally accepted design convention arises, but broadly speaking it’s all the same.
Voice user interfaces (VUIs) are different because it’s all about teaching computers how to talk to humans, and that’s a huge shift. The best VUI should be so intuitive that anyone can start using it without knowing what the voice app even does.
The way humans talk is topic driven and often spontaneous. Think about the last time you greeted someone you know — odds are it was very different each time, even if the topic was different. Because of this, VUIs must be anything but uniform. VUIs must be as spontaneous and free-flowing as humans whilst retaining their ability to converse over defined topics. In the UI world, this would be akin to changing the color and position of a login button every time the user returns to keep the “conversation” fresh.
2. Complex vs Simple to build
Mobile apps are costly to design and costly to build well because they have a lot of complexity, and are relatively large projects. This complexity makes mobile apps very hard to build.
The lack of visuals and complex frontend makes voice apps very straightforward, and easy to build. The heavy lifting of natural language processing and understanding is all performed by the platforms.
What people forget about when it comes to voice apps however is design is the most important element. It’s easy to build a conversation but it is very hard to design a good one. The industry is still figuring out a loosely-held set of voice interface standards.
3. Device vs cloud-based
Mobile apps are heavy. Every single mobile app requires you to download an app which takes up space on your phone, and time to download. Because of this, people have fewer apps on their phone. The average number of mobile apps used per month according to App Annie in 2017 was 40. Personally, I was shocked to find it was 40 as I legitimately use less than 10.
Voice apps are light because they are cloud-based. This means they don’t live on the device itself but are accessed whenever needed from the cloud. This approach dramatically reduces the amount of effort the user has to contemplate when deciding to use a service. In theory, a user can “have” as many apps as on their voice assistants as they have questions for the assistant.
4. Deep vs fast engagement
People in North America spend on average of 45-minutes per day on their smart phones. This is because apps on smartphones are meant to have deep engagement that can entertain or help the user for a long duration of time. Paired with the heavy nature of mobile apps (see point above) and we come to the conclusion that mobile apps are meant to be few in number on your phone, but highly engaging.
Voice apps are not meant to be engaging but highly functional and fast. When ordering a coffee from Starbucks, you don’t want to have a conversation — you just want a coffee. Because of this, and the fact that voice apps are cloud-based points to voice apps being vast in number and shallow in engagement. You want to be able to order from any coffee shop you see through your voice, but you don’t want to have a conversation with the said coffee shop.
5. Explicit vs Implicit discovery
Mobile apps are discovered through stores like the iOS app store and the Android Google Play store. This marketplace approach makes sense for these platforms because the apps are highly engaging. Usually, mobile apps have to compete or market externally in order to get people to download them. This model works however because the app is highly engaging, and thus highly monetizable, once it has been downloaded.
Voice app discovery is broken right now (mid-2019). Voice apps are light and shallow in functionality but wide in number. The problem is the current discovery system is explicit the same as an app store — you have to know what you want and find it by name. This manifests within the Alexa Skill store and Google Action store. However, this approach is completely backward and has caused many critics to push back on voice harder than they realistically should.
The future of voice app discovery is implicit discovery where you state your intention and the platform discovers a voice app to serve your need. An example would be asking to go to the airport and your voice assistant finding you a ridesharing app appropriate for your needs.
When this shift from explicit to implicit happens, and it’s already happening slowly, then voice apps won’t be seen as broken mobile apps but instead as nodes of functionality to serve users within the voice controlled and personalized search engines that are Alexa and Google Assistant.
6. SMBs don’t need vs (might) need one
Every big consumer-facing brand today has a mobile app, and sometimes several. There was a time when every business was expected to have a voice app with small pizzerias being sold by agencies on the idea of “the new website”. However, that myth has been generally debunked thanks to the help of aggregator platforms like Yelp and Google — and time. Most small businesses need a landing page that makes it accessible by search (even less so with Google businesses today). However, given the explicit discovery mechanics of mobile where you have to know what you’re looking for, mobile apps didn’t serve as discovery, and most small businesses don’t have any reason to have a deeply engaging mobile app either. A landing page suffices to solve all these needs and more. In short, not everyone needs a mobile app.
More businesses will need voice apps in the short term to be accessible by voice search once it shifts to explicit. However, there are two factors that remain to impact the outcome:
1. The role of aggregators on AI assistants
It could likely be the case that aggregators like Yelp and DoorDash or the platforms themselves act as the conversational bridge between small businesses and consumers. For example, you could ask Alexa to order you a large pizza from the shop down the street and pending the shop has filled in their information on Alexa’s database, they would be able to fulfill the order. This would be a heavy lift for Alexa however and would likely fall into the hands of aggregation platforms like DoorDash and Yelp who already have existing relationships with these small businesses.
There is a future where for some reason these aggregators do not fulfill this need. In that case, there is room in the market for a “Wix of voice” to help small businesses build voice apps, but I find that unlikely.
2. How fast explicit discovery shifts to implicit discovery
Imagine if today Google Assistant performed a search query and returned you the perfect third party voice app every time you asked for something you needed. This is the world of implicit discovery, however, it seems to be further away than many hope. Until implicit discovery is solved, there is no point in being a small business with a voice app if your goal is to attract customers. There are use cases for small businesses we’ve seen where it’s great to build a voice app to retain and engage existing customers. An example we’ve seen at Voiceflow is a meditation studio creating a series of take-home lessons for existing customers to practice at home on their voice assistants. That’s a fantastic use case and a great idea for a small business. Building highly verticalized functional use cases for voice is the only reason a small business should build-out a voice app in 2019 because it certainly won’t bring any additional customers because of the lack of implicit discovery.
Voice apps are often ridiculed on the benchmark of mobile apps. They have shallow engagement, poor discovery, and even poorer retention.
However, this ridicule is silly because it’s too early to know what voice apps will eventually become. In the early days of mobile when smartphones were benchmarked against desktops, they were considered to be underpowered and difficult to use for anything past a simple email. We now know this to be a dated opinion.
We think the same way about voice apps. It’s early days, but there are certainly shining examples of a platform shift just getting started.