The 4 layers every AI assistant needs (hint: more than just an LLM)

Is the wave of large language model (LLM) interest about to break? Perhaps not. But I do see a plateau ahead in response to people’s newfound understanding that ChatGPT can’t do everything, and that it’s just one slice of a healthy AI assistant tech stack. 

Google Searches for “LLM” over the past 12 months

The way I see it, every assistant still has four foundational layers or “jobs to be done.” Here’s how those break down, and also how to explain them to others within your company who may be thinking “AI everywhere all at once.”

Yes, LLMs are the future. But they aren’t the entire present.

Every assistant must understand four things: the world, company, goal, and customer context

To understand these four layers, it’s actually helpful to think about hiring a new trainee. If I’m a Starbucks manager who hires a barista, I know they come ready with general knowledge about the world and are skilled at interfacing with people. That’s a great foundation. But to be effective in the role, they’ll need more than that. They’ll need:

  1. General knowledge—people can ask them about news and the weather. 
  2. Company knowledge—making drinks, cleaning up, rules of work.
  3. Users’ goals—kindly decoding caffeine-deprived maniacs’ needs.
  4. Users’ context—remembering customers and their orders.

That first item—general knowledge—is what conversational assistants were lacking up until now. If people asked oddball questions, they got confused. That’s what LLMs can now provide. But again, LLMs cannot do everything. And these layers actually work together in stages.

Let’s explore all four.

1. LLMs provide knowledge of the world

Prior conversational assistants were reliant paths and thus limited by the creativity and endurance of their makers. If a customer asked “In Star Wars, who shot first, Han Solo or Greedo?” past models would have errored-out. But now, they can fire up the LLM to provide that general world knowledge and respond “Greedo.” Then they can direct users back to the happy path: “Are you ready to place an order?”

But as now outraged Star Wars fans reading this are thinking, LLMs are prone to lie. (Han shot first.) LLMs are trained on language, not truths, and so hallucinate, and are wrong even while sounding confident.

The example above may be silly, but it’s indicative—LLMs have no way to judge the various versions of Star Wars films any better than they can provide true information about your product or how it’s used, unless you specifically programmed that into the flow.

So, LLMs fill a gap. But to keep them from spouting misinformation, they first need to be filtered through the company’s own context.

2. Company-specific knowledge bases provide details about your company and product

A knowledge base is a customer-facing repository of documents like web pages, PDFs, and text files that you can turn into a vector database, searchable by a language model. When someone asks the assistant something it can’t answer, it should check here before it asks an LLM to guess. 

The knowledge base is of course far smaller than the general base, but it’s guaranteed to be relevant and exceedingly accurate. If that person asks how to enter their credit card details, you don’t want the LLM searching for real (or imagined) examples of how to do this. You want it to locate the official company answer. It’s just more efficient.

Say your assistant is fully run by an LLM, and it goes off-piste and asks a user for their username and password when prompted to help them log in. Unfortunately, it can't do anything. And your user will be sitting there thinking, “What next?”

Think of LLMs as "thinking machines" and the current generation as "doing machines." Mash them together, and you get something quite useful.

3. Dialog managers provide goals, actions, and integrations

So, your LLM has asked for the user's login information, which it can't actually do anything with. It just generates language. You now need a dialog manager that can capture and hold the information before feeding it back to an API to automate the process. This is how it can feed those credentials into the API which feeds into your app and initiates the login. 

But you can’t ask users to log in every single time. Nor build orders from scratch every time. Nor have to repeat themselves if they gave you an utterance like part of their order, asked a non-sequitur question about Disney films, then wanted to complete the transaction. 

To maintain all that context, you’ll need the fourth foundational layer. 

4. Saved customer conversation context adds conversational and relational awareness

If you listen to friends speaking to each other, you don’t often hear them use each others’ names. The introductions are implied. They have years of context. And if you eavesdrop on their conversation, they hop between topics, and are sometimes pursuing 2-3 threads at once. This is difficult for assistants to do, but getting it right makes for truly delightful experiences

Push all that saved customer data plus conversation history through an LLM to generate more specific and helpful responses. With this context, the LLM isn’t going to ask them to repeat things they’ve already done. It will remember their last order, as well as the part of the order they started then stopped. And when that customer initiates a conversation, it knows they recently placed a transaction and asks, “Are you calling about your last order?”

Like a human barista, the assistant is thinking and acting on multiple layers all at once, which creates a dynamic, satisfying, and successful interaction. 

Let’s see those four layers in action

You can see the handoff from our rules-based dialog manager in the screenshot below. When it reaches the “What is Voiceflow” utterance, you can see the “no match” icon. That boots up our LLM which is fed the conversation context to take a pass at the response. You can see the “Ai” icon next to that response, so we know where that came from. The same thing happens for the question, “Who are your competitors?” 

Finally, the user gets back on track by invoking our “customer stories” intent which had a pre-written response. Now they’ve resumed the happy path, and the dialog manager provides suggested stories.

People don’t like to reach dead-ends. Even if they know it’s a bot, it’s still awkward. So the use of LLMs in the example above led to a far better customer experience and made the conversation flow; it wasn’t rigid. And further, we can now improve our assistant with custom language by designing flows for utterances we didn’t have coverage for, and questions we didn’t know we would get.

Thinking about building your own LLM? Think again

All you craftsman readers thinking you could build your own LLM to do even better, maybe. But it’s probably not realistic for most companies. You need a structured database of tens of billions of words where someone has tagged what’s true and what’s false and trained the model, and even if you spend 6-12 months on it, you still won’t even be close to what’s already commercially available. And, the commercially available stuff is reasonably priced enough that that is almost always worth running a pilot first.

And anyway, if you’re using preexisting LLM technology, you’re still going to run into the truthfulness problem. Your LLM is essentially guessing the next word, but that’s not really a problem (or less of a problem) when you’re using all four layers as we’ve described. 

If you’re curious to try, you can try them within Voiceflow.

Say no to random acts of GPT

I’m hearing lots from customers right now who feel frustrated that executives are demanding they incorporate ChatGPT into everything without a plan. Product owners are responding, “I can’t. That’s not what the technology does. This is not how that works.” Hopefully, this article is a pretty decent response to that question. 

LLMs are the future. But they’re not the entire future. For the time being, there are still three other important parts.


Why the dominant discourse surrounding LLMs needs to change

No items found.