Voiceflow named in Gartner’s Innovation Guide for AI Agents as a key AI Agent vendor for customer service
Read now

Paychecks. Defensive linemen. Leg room. Some things are better when they’re bigger. But according to an Arxiv study, "Lost in the Middle: How Language Models Use Long Context” that’s not the case when it comes to LLMs’ context windows.
When researchers took a look at several LLMs, they discovered a U-shaped pattern. LLMs seemed to parse out information reliably when it fell at the start or tail end of the context, think the introduction and conclusions. But the information in the middle? Not so much.
This pattern is reminiscent of something humans experience. It's called the Serial Position Effect. Hermann Ebbinghaus, back in 1966, found that humans tend to remember items at the beginning and end of a list better than those in the middle. These LLMs, in some ways, reflect our human cognitive tendencies.
And so, when LLMs process vast amounts of text, their attention seems to be more concentrated at the beginning and end, thinning out in the middle. Which means that in a context window that allows for 100,000 tokens, there’s an even bigger risk that the LLM will misinterpret or fail to retrieve important information in the middle.
Imagine you're in a lecture. You don't hang on to every single word the professor says. Instead, you focus on the key points, the bits that seem most important. Treat your LLM like a new student. A great way to do this is by using Retrieval Augmented Generation (RAG). Here, you implement a vector database—like a knowledge base—to retrieve information and pass it to the LLM as part of the context in its prompt. This database helps expose relevant content that gets stuck in the middle.
To help make sure that your vector database is doing its job and placing the most accurate information upfront, you can do seven simple things to format your documents (these tips also work for extra-long prompts):
In a race to be the best in the AI space, LLMs want bigger windows. But, much like a human brain, LLMs can lose sight of important details in the middle. So, where does that leave us?
Using clear, consistent formatting in your prompts and documents will help your LLM parse the information you give it. And implementing a vector database that uses RAG (like a knowledge base) will help your LLM find the right information even from the muddled middle. That way, you get the responses you want from your LLM, whether the context window is 4,000 tokens or 100,000.