The context window paradox: Why bigger might not be better

If you’ve been keeping an eye on large language models (LLMs), you’ll have noticed that context windows are getting bigger and bigger. Anthropic is now boasting a whopping 100,000 token window, rivaling GPT-4 and LLaMA at 32,000 tokens. If you’re feeding whole reports into your agent or working with lengthy, complex prompts, this sounds like a miracle. Finally, you can give AI every single detail and it’ll offer you the most detailed, best responses, right? 

Well, a recent study suggests that bigger may not always be better when it comes to context windows. And I’ve read through the entire 18-page study to pull out the key insights for you—and offer seven ways you can make your documents and prompts easily retrievable for any LLM, no matter the size of the context window. 

Is bigger always better? Not when it comes to context windows

Paychecks. Defensive linemen. Leg room. Some things are better when they’re bigger. But according to an Arxiv study, "Lost in the Middle: How Language Models Use Long Context” that’s not the case when it comes to LLMs’ context windows.

When researchers took a look at several LLMs, they discovered a U-shaped pattern. LLMs seemed to parse out information reliably when it fell at the start or tail end of the context, think the introduction and conclusions. But the information in the middle? Not so much. 

This pattern is reminiscent of something humans experience. It's called the Serial Position Effect. Hermann Ebbinghaus, back in 1966, found that humans tend to remember items at the beginning and end of a list better than those in the middle. These LLMs, in some ways, reflect our human cognitive tendencies. 

And so, when LLMs process vast amounts of text, their attention seems to be more concentrated at the beginning and end, thinning out in the middle. Which means that in a context window that allows for 100,000 tokens, there’s an even bigger risk that the LLM will misinterpret or fail to retrieve important information in the middle. 

7 ways to help LLM understand all the details, no matter the token size

Imagine you're in a lecture. You don't hang on to every single word the professor says. Instead, you focus on the key points, the bits that seem most important. Treat your LLM like a new student. A great way to do this is by using Retrieval Augmented Generation (RAG). Here, you implement a vector database—like a knowledge base—to retrieve information and pass it to the LLM as part of the context in its prompt. This database helps expose relevant content that gets stuck in the middle. 

To help make sure that your vector database is doing its job and placing the most accurate information upfront, you can do seven simple things to format your documents (these tips also work for extra-long prompts):  

  1. Structured layout: Organize your content with clear headings to help your LLM decide which parts of a text to zoom in on and prioritize when generating a response. Clear headings and structure also help the LLM easily navigate and pinpoint crucial information.
  1. Summaries: Start each section with a brief overview to let the model quickly grasp the essence of the section, without getting bogged down in the details.
  1. Frequently searched items: Ensure your documents spotlight frequently searched items like product names, IDs, and common queries. Define them clearly, wherever possible. 
  1. Consistent formatting: Keep consistent formatting across your content to help the model recognize patterns and retrieve info more predictably. 
  1. Jargon: Nobody likes jargon, not even LLMs. If you've got technical terms, make sure to provide clear explanations. Remember, treat your LLM like a new student.
  1. Break up blocks: Put the most important information related to your prompt up front. Use bullet points to help distill complex ideas into digestible chunks and break up dense blocks of text into shorter paragraphs. 
  2. Table of contents: Creating a table of contents can help models traverse big documents quickly and keep things organized.

Big or small, help your LLM see it all

In a race to be the best in the AI space, LLMs want bigger windows. But, much like a human brain, LLMs can lose sight of important details in the middle. So, where does that leave us? 

Using clear, consistent formatting in your prompts and documents will help your LLM parse the information you give it. And implementing a vector database that uses RAG (like a knowledge base) will help your LLM find the right information even from the muddled middle. That way, you get the responses you want from your LLM, whether the context window is 4,000 tokens or 100,000.

RECOMMENDED

LLMs won’t replace NLUs. Here’s why