Jumping off the AI hype train: NLUs in an LLM-dominated world

In the past few months, the AI hype train has zoomed past the NLU station—many folks seem to believe there’s no longer any use for natural language understanding (NLU) models. The conversation has shifted from deterministic NLU-based assistants to “LLM” everything. 

The effortless way in which folks have shrugged off such a foundational tool has me beginning to question a few things. Do our tried-and-true uses for NLUs still serve us in the world of large language models (LLMs)? 

I’d argue yes, NLUs still have a place in the conversational AI space. There are appropriate applications for both, with different benefits and drawbacks. The truth isn’t as simple as “AI can do it”—so jump off the hype train at this stop and stay awhile. 

In this article, we’ll explore some NLU history to understand why they’ve proved so useful, what you’ve gotta know about the limitations of LLMs, and where each of their strengths can serve you. And stick around until the end for a practical prompt you can plug into your digital agent’s LLM and test yourself. 

Natural language understanding—Where it all began

Before GPT-3, NLUs (and later, transformer language models) worked by taking the continuous spectrum of meaning in human language and organizing it into a discrete set of intents and entities. Imagine a user asking a voice assistant to play a specific song. The transformer language model must analyze the input text, identify the intent (i.e. play_music), and extract relevant information (e.g. song title, artist) to execute the desired action. 

An NLU acts as the sorter, first analyzing and understanding the context of the words and phrases, and then placing them into appropriate categories. This process of breaking down human language into discrete intents allows computers to effectively communicate with and respond to users in ways that feel more natural and intuitive. 

From 2014 until 2020, this process was the pinnacle of innovation. NLUs allowed for a more structured and organized representation of human language, which made it easier for AI to comprehend and respond to user intents.

And so, in the past eight years, the industry has given rise to a canonical framework of thinking about NLU models, divided two ways:

1. Inference

Input: Typically a text or a sentence provided by the user, representing a natural language query or command.

Output: The NLU model identifies the intent and any relevant entities or information extracted from the input text.

2. Training model

Intent: An intent represents the underlying purpose or goal that a user wants to achieve through their input in a natural language processing system.

Entity: An entity refers to specific pieces of information or data extracted from a user's input, such as names, dates, or locations, which are relevant to the identified intent.

Utterance: An utterance is a user's input or expression in a natural language, which can be a single word, phrase, or sentence that the NLU system processes to identify intents and entities.

It is upon this framework that the entire recent past of conversational AI has been built. Many believe that AI and large language models are quite novel, when in reality, LLMs are an innovation birthed from this canonical NLU framework. 

Large language models—The model must grow

Bustling cities are not born in a single day. Population growth, economic opportunities, and advancements in technology drive a metropolis’ development. Similarly, the development of LLMs has been fueled by factors like the availability of vast amounts of data, advances in hardware, and improvements in algorithms.

LLMs differ significantly from earlier transformer-based NLU models used for identifying user intents or extracting entities. Large language models are akin to an all-in-one tool that can understand, generate, and complete tasks with human-like skill. They develop this ability through deep learning techniques, in which massive data sets containing diverse texts are used to train the models. Leveraging the power of transformers—a type of neural network architecture—LLMs capture long-range dependencies and learn complex patterns in language.

Unlike their NLU predecessors, which were designed for narrower tasks, LLMs are trained on a wide range of linguistic tasks and fine-tuned for specific applications. This allows them to perform various NLU tasks without the need for task-specific architectures or extensive retraining. As a result, LLMs offer greater flexibility, scalability, and efficiency in handling complex NLU tasks compared to traditional transformer-based models.

LLMs also have two capabilities beyond the scope of traditional NLUs, which are worth noting. They are: 

1. Emergent capabilities

As LLMs learn from diverse text sources, they pick up patterns and connections in the data. This allows them to develop a deep understanding of language and its nuances, which in turn leads to the emergence of new capabilities. In simple terms, these are unexpected skills or abilities that were not explicitly programmed into the AI but instead arose naturally during its training process.

As these models become more advanced, they can take on tasks beyond simple text generation or translation. For instance, an LLM with emergent capabilities might be able to answer complex questions, summarize lengthy documents, or even generate creative stories. Emergent capabilities leave room for future innovations in conversational AI that we’re not yet aware of. 

2. Role-playing

In video games, just as a game master might set up an adventure for players by describing the setting, characters, and objectives, a system prompt helps define the scope of interaction with an LLM. A role- playing prompt like, “Pretend you’re a lawyer who is an expert in family law…” or “Pretend you’re a developer writing an SEO blog post on best practices…” was previously impossible to imagine in a pre-LLM world. Today, users can leverage system prompts to explore various topics, extract valuable insights, or even seek creative solutions to complex problems.

Now that we’ve explored a brief history of NLUs and LLMs, it’s time to see how each model operates now and how they can serve your specific use cases (spoiler alert: LLMs aren’t the clear winners across the board). 

The case for NLUs: Better control and performance than LLMs 

In particular, NLUs outperform LLMs in two main categories: control and performance. 

1. Ownership + control

NLUs offer observability on how the model is making decisions

NLUs offer observability options to peer under the model layers and decision/activation paths. Machine learning practitioners and developers can inspect the metadata and representations to ensure that the model exhibits appropriate behavior in terms of balance, toxicity, performance, and more. This is much more transparent than LLMs, which are often called “black boxes” because of their lack of observability. 

NLUs offer more control over the possible set of outputs

Since the NLU is trained with a predefined set of intents and entities, the model’s output is well-behaved: It will only output intents and entity values that exist in the training data.

For example, an NLU model can consistently output the “pizza_order” intent when a user asks “Can I order a pizza?” whereas an LLM-based system could output labels like “order_pizza”, or “buy_pizza” for the same intent, causing inconsistent results. 

NLU data remains within your infrastructure—never with a third-party vendor

NLU models are small enough to be deployed on a laptop, which means even startups can deploy these models. This is a huge advantage when it comes to data sovereignty, privacy, and egress constraints (e.g. when handling medical records).

NLUs are better at controlling intellectual property

When it comes to LLMs, there are countless ongoing ownership and copyright disputes. With an NLU, you have access to all of the components in the data path and the training data. 

Having this level of control reduces the risk to your organization. Your model won’t disappear after a particularly bad lawsuit or suck your organization into AI-related legal troubles. 

NLUs offer unlimited control over model versions and deployment

Today, LLM vendors can update or deprecate their models with little or no notice. This filtering can have adverse effects, generalizing outputs to suit a wider audience while becoming less useful for certain use cases. As a result, systems built on top of LLMs may exhibit unexpected behavior when new LLM models (and behaviors) are released by the AI vendor. 

With an NLU, you’re shielded from the effects of an AI vendor’s frequent, mysterious changes. If you want to deploy a previous version of your NLU system, you can. If you want control over censorship and outputs, you have it. 

2. Performance

NLUs are easier to fine-tune for uncommon terminology

NLUs are small enough to be fine-tuned within a reasonable time frame and budget. This is incredibly useful if the language or area of knowledge you are operating within doesn’t overlap well with an LLM model’s training data. 

For example, you can train your NLU on organic chemistry compounds if your user input is highly specific: “1-ethyl-3-(1-methylbutyl)cyclohexane”. You can fine-tune your NLU model on text data with keywords and phrases specific to your use case—whether it’s advanced mathematics or medical terminology.

NLUs reduce inference costs and latency

The baseline cost of operating NLUs is much lower than that of LLMs, in the case of self-hosted, open-source models and third-party vendors. The cost is lower since less powerful hardware is needed; the operational cost is lower since less computation is needed to produce the same set of outputs.

NLUs reduce training costs and time

Due to their much smaller size and training data requirements, NLUs can be trained much faster with fewer GPUs than their LLM counterparts. This reduces training costs and time required to get the model up and running.

The case for LLMs: They can emulate NLU behavior with greater accuracy

On the other hand, LLMs do outperform NLUs in three ways. First, due to the higher capacity of LLMs, they can be used to emulate NLU language models with minimal training data and examples, and produce generally more accurate results all without retraining the model. 

Second, LLMs enable free-form behavior such as open-ended entities that would otherwise be difficult to train into an NLU. Think of a user asking a complex question that the NLU hasn’t been trained for, an LLM would more easily be able to generate a correct answer based on extracting an open-ended entity. 

Finally, by using LLMs, you remove the necessity of a large training dataset—only a few examples are needed at most in the LLM prompt. As demonstrated below, no exhaustive list of entity values and utterance examples for each intent was provided; the system performs well and demonstrates the ability to disentangle very similar user inputs (only one word is different) into different intents.

Example: Pizza prompt for your LLM

The intent list should only contain the top three intents and ensure the output is a valid YAML.


Run NLU inference on the following user input:

"I want a tiny BBQ chicken wing"



Run NLU inference on the following user input:

"Do you think wings are better or pizza is better?"


Pizza prompt results with an LLM

  • By turning the temperature parameter down to 0.1, it is possible to produce highly deterministic outputs that conform to the specifications provided in the System prompt.
  • By tuning the System prompt, it is possible to emulate and reconfigure the NLU behavior without retraining:
  • We can ask the model to output intent classification confidence on an arbitrary scale
  • Additional intents and entities can be added into the system prompt
  • no_match behavior can be modified
  • The LLM inference has emergent behavior such as synonym resolution and value description (i.e. “any pizza topping”).
  • Though reducing temperature and adding more regularizers into the System prompt (e.g. “YAML output”, “ensure it is a valid YAML”, “```yaml”) can produce convincing YAML outputs, LLM outputs MUST always be validated and sanitized as there could be indeterministic behavior that is difficult to reproduce.

NLUs vs. LLMs: Where do NLUs fit into a LLM-dominated world?

On our journey, we’ve stopped to take in the history of NLUs, how LLM models have outpaced them, and where we can still utilize NLUs for use cases that require more control and performance. 

In the end, LLMs are incredibly powerful and can emulate NLUs very effectively. But I implore you to resist the urge to “ChatGPT everything”. Jump off the hype train, do your own due diligence to figure out what technology works best for your use cases. If you don’t, you might find yourself spending a lot of time and money on a technology that doesn’t work for you, doubling back from a costly AI train ride to your tried-and-true NLU. 


Creating the Voiceflow NLU

No items found.