The AI Wild West—why you need a knowledge base for your AI agents

Conversational AI (CAI) is a burgeoning field. So is generative AI. Put the two together and it can feel like you’re throwing in a vigilante cowboy in your already Wild West. 

But what if instead of gun-slinging rivals, the two could be partners? It’s not the stuff of country westerns—essential information like your FAQs, terms and conditions, and key information about your company can all be integrated into your digital agent with the help of LLMs. 

The way to do it? With a knowledge base (KB).This is the simplest way to get an assistant to give specific, accurate knowledge to users. A knowledge base allows you to upload your proprietary data, so when customers ask a question of your agents, it’ll expertly answer the question with the help of LLMs but within your established guardrails. 

Let’s break down why the knowledge base functionality in conversational AI platforms is about to be one of the most important features and how you might use it for your own agents (and beyond).

How will a knowledge base change how you build AI agents?

Building conversational AI agents often feels tedious. In order to implement your FAQs, internal information, or even data on your website, many teams lean on manual processes. Now, many conversational AI (CAI) platforms are launching knowledge base features, so your CAI teams should learn what to expect and how to utilize this feature to its full potential. With a knowledge base, your agents now have order priority to glean information from approved data before moving to other sources—KB acts as the guardrail for your LLM. 

Here’s an example of how an AI agent works with a KB: When your agent is in a listening state, it will begin by using an NLU to match your user's response to an intent you’ve already mapped, like collecting account information or surfacing a help link. If your agent can’t find a match, it moves down the order priority and searches your knowledge base to find sections of documents you’ve uploaded that have the closest semantic similarly. If it finds a match, it will use an LLM to generate an answer to address your user’s intent. If your agent finds no match in the KB, your agent then moves to generative AI to serve a relevant response.

This tiered approach makes building and launching your agent much more efficient—no need to manually write intents based on data you already have. You’ll have fewer fears that your agent will hallucinate answers to questions or serve inaccurate information about your company. Think of a knowledge base like your stockroom: an inventory of your already existing product data, services, and information which your agent can access to serve the customer whenever they need it. Fill your stockroom with approved documents, tag them accordingly to improve accuracy, or set queries to only access certain datasets. 

And because you’re taking the power of LLMs and using the guardrails of your own data, the possibilities with KB are many. Integrate your KB into your website to power your help center’s search function. Upload internal documentation like employee IDs and HR resources to create an internal agent that can summarize training and personalize professional development. 

You can even implement KB internally to help your employees with research. Users can provide the KB a new source and ask it to summarize, analyze, and make recommendations based on that new data and KB’s existing dataset. Now, you can gain useful insights you’d expect from LLMs using your secure, proprietary data.   

How do you get the most out of your knowledge base?

  1. Remember: trash in, trash out

Your knowledge base is exactly what it sounds like—a foundational collection of your company’s knowledge. That means you should be really selective about what you fill your stockroom with. If your first instinct is to “give it everything,” remember, more is not always better.

Instead, upload succinct data to your KB. Don't include unnecessary data sources. Tag all documentation and keep a detailed record of your tagging practices so others can maintain it too. Your goal should be to fine-tune your dataset so your KB is using the best information available. That's what's going to get you really good responses. 

Sure, if you have beautifully curated help articles already, this process is likely to be easier. But when it comes to LLMs, doing things the quick and dirty way doesn’t typically yield you the best results. Take the time and your future self will thank you. Don’t fret, I’ve created a detailed guide to help you out.

  1. Your knowledge base isn’t a parrot—more like a newly hired temp

Imagine you’ve hired a new temp and put them at the front desk of your office for everyone to ask them questions, speak on behalf of your brand, and complete tasks. But you didn’t train them, you just gave them access to the internet. In this scenario, you’re basically asking them to Google the right things—at best, they’ll make silly mistakes and at worst, they’ll go completely rogue. Without a knowledge base, digital AI agents are basically that sorely unprepared temp. 

Instead, treat your KB like a new human temp. Don’t just give them the internet, or a collection of disorganized company materials strewn together. Give them a curated training manual, go through each page of documentation and tag what’s important to them, give them context and don’t expect them to piece together multiple sources of information to create a single cohesive answer. 

If you’re struggling to get your KB to serve the right responses, consider these four questions:

Are you trying to do multiple steps in one?

Are you trying to perform a sentiment analysis, ask for a recommendation, and complete formatting in one step? 

  • Are you trying to perform a sentiment analysis, ask for a recommendation, and complete formatting in one step? 

Have you ensured the documentation you’ve uploaded to KB is accurate, concise, and complete?

  • Text documents with good line separations and questions close to their answers will perform best.
  • Documents that have strange formatting, messy tables, and overlapping topics will work more poorly. My colleague Peter Isaacs has some more advice, if you need it.  

If you have Q&A pairs, does your KB allow for them to be uploaded and added to your tiered order of intents?

  • e.g. Your agent searches for an answer to your user’s intent starting with your NLU, then your FAQs, then KB, then finally AI? 

Are you expecting the KB to generate the exact wording you’ve uploaded in your documentation? 

  • If you expect it to generate responses that match an exact sentence you’ve uploaded, you’re trying to brute force a response that you should probably manually add to your agent using an NLU model or continue tweaking the prompt. 

Your KB is neither a parrot nor a magician. It uses your selected LLM to offer unique responses that are more dynamic, humanlike, and personal while sticking to your KB’s dataset. And an LLM, like a person, does better with a singular focus and a clear guide to follow (or in this case, a refined dataset).

  1. A knowledge base can help you get more efficient—if you let it 

Having an AI agent in production can get expensive without thoughtful design, but by fine-tuning your KB, you can reduce chunk limits and token consumption. In all my time at Voiceflow, I’ve seen very few use cases where you need more than 100 tokens. I once spoke with a customer who was using 25x the amount of tokens other companies used. They were using GPT-4 because they believed it was the only LLM that could serve them “good” answers. But in reality, if they spent more time on their prompts, they would have received comparative responses with a cheaper LLM. It’s the same principle with knowledge bases. 

The better curated your KB, the fewer tokens you’ll use. If your LLM is not pulling from disparate sources to piece together the same answers, it’ll be more efficient and cheaper. And just like training your temp, don’t give your KB several sources with answers across all of them on the same topics. The most efficient use of your time is to focus on prompts and fine-tuning your KB so you need fewer and more concise responses. You’ll see major reductions in your token usage (and in turn, cost-savings). 

Tame your AI Wild West with a knowledge base

Your knowledge base has the potential to be the peacekeeping sheriff in the Wild West of conversational AI. For developers and designers needing to deploy AI that means fewer hallucinations, lower LLM costs, and more on-brand agents are just the beginning.

In the future, I foresee KB helping conversational AI teams perform computational functions and analyze complex data. For now, I’d encourage you to explore knowledge base features on your current conversational AI platform (or give Voiceflow’s a try). I happened to pen a nifty article on just that. You can find it right here.