Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI

Introduction

Retrieval-Augmented Generation (RAG) may have been necessary for the first wave of enterprise AI, but it’s quickly evolving into something much larger. Over the past two years, organizations have realized that simply retrieving text snippets using vector search isn’t enough. Context has to be governed, explainable, and adaptive to an agent’s purpose.

This post explores how that evolution is taking shape and what it means for data and AI leaders building systems that can reason responsibly.

You’ll come away with answers to a few key questions:

How do knowledge graphs improve RAG?

They provide structure and meaning to enterprise data, linking entities and relationships across documents and databases to make retrieval more accurate and explainable for both humans and machines.

How do semantic layers help LLMs retrieve better answers?

Semantic layers standardize data definitions and governance policies so AI agents can understand, retrieve, and reason over all kinds of data as well as AI tools, memories, and other agents.

How is RAG evolving in the age of agentic AI?

Retrieval is becoming one step in a broader reasoning loop (increasingly being called “context engineering”) where agents dynamically write, compress, isolate, and select context across data and tools.

TL;DR

(RAG) rose to prominence following the launch of ChatGPT and the realization that there is a limit on the context window: you can’t just copy all your data into the chat interface. Teams used RAG, and its variants like GraphRAG (RAG using a graph database) to bring additional context into prompts at query time. RAG’s popularity soon exposed its weaknesses: putting incorrect, irrelevant, or just too much information into the context window can actually degrade rather than improve results. New techniques like re-rankers were developed to overcome those limitations but RAG wasn’t built to survive in the new agentic world.

As AI shifts from single prompts to autonomous agents, retrieval and its variants are just one tool in an agent’s toolbelt, alongside writing, compressing, and isolating context. As the complexity of workflows and the information required to complete those workflows grows, retrieval will continue to evolve (though it may be called context engineering, RAG 2.0, or agentic retrieval). The next era of retrieval (or context engineering) will require metadata management across data structures (not just relational) as well as tools, memories, and agents themselves. We will evaluate retrieval not just for accuracy but also relevance, groundedness, provenance, coverage, and recency. Knowledge graphs will be key for retrieval that is context-aware, policy-aware, and semantically grounded.

The Rise of RAG

What is RAG?

RAG, or Retrieval-Augmented Generation, is a technique for retrieving relevant information to augment a prompt that is sent to an LLM in order to improve the model’s response.

Shortly after ChatGPT went mainstream in November 2022, users realized that LLMs weren’t (hopefully) trained on their own data. To bridge that gap, teams began developing ways to retrieve relevant data at query time to augment the prompt – an approach known as retrieval-augmented generation (RAG). The term came from a 2020 Meta paper, but the popularity of the GPT models brought the term and the practice into the limelight.

Tools like LangChain and LlamaIndex helped developers build these retrieval pipelines. LangChain was launched at around the same time as ChatGPT as a way of chaining different components like prompt templates, LLMs, agents, and memory together for generative AI applications. LlamaIndex was also launched at the same time as a way to address the limited context window in GPT3 and thus enabling RAG. As developers experimented, they realized that vector databases provide a fast and scalable way to power the retrieval part of RAG, and vector databases like Weaviate, Pinecone, and Chroma become standard parts of the RAG architecture.

What is GraphRAG?

GraphRAG is a variation of RAG where the underlying database used for retrieval is a knowledge graph or a graph database.

One variation of RAG became especially popular: GraphRAG. The idea here is that the underlying data to supplement LLM prompts is stored in a knowledge graph. This allows the model to reason over entities and relationships rather than flat text chunks. In early 2023, researchers began publishing papers exploring how knowledge graphs and LLMs could complement each other. In late 2023, Juan Sequeda, Dean Allemang, and Bryon Jacob from data.world released a paper demonstrating how knowledge graphs can improve LLM accuracy and explainability. In July 2024, Microsoft open-sourced its GraphRAG framework, which made graph-based retrieval accessible to a wider developer audience and solidified GraphRAG as a recognizable category within RAG.

The rise of GraphRAG reignited interest in knowledge graphs akin to when Google launched its Knowledge Graph in 2012. The sudden demand for structured context and explainable retrieval gave them new relevance.

From 2023–2025, the market responded quickly:

January 23, 2023 – Digital Science acquired metaphacts, creators of the metaphactory platform: “a platform that supports customers in accelerating their adoption of knowledge graphs and driving knowledge democratization.”

February 7, 2023 – Progress acquired MarkLogic in February of 2023. MarkLogic is a multimodal NoSQL database, with a particular strength in managing RDF data, the core data format for graph technology.
July 18, 2024 – Samsung acquired Oxford Semantic Technologies, makers of the RDFox graph database, to power on-device reasoning and personal knowledge capabilities.
October 23, 2024 – Ontotext and Semantic Web Company merged to form Graphwise, explicitly positioning around GraphRAG. “The announcement is significant for the graph industry, as it elevates Graphwise as the most comprehensive knowledge graph AI organization and establishes a clear path towards democratizing the evolution of Graph RAG as a category.”
May 7, 2025 – ServiceNow announced its acquisition of data.world, integrating a graph-based data catalog and semantic layer into its enterprise workflow platform.

These are just the events related to knowledge graph and related semantic technology. If we expand this to include metadata management and/or semantic layers more broadly then there are more deals, most notably the $8 billion acquisition of metadata leader Informatica by Salesforce.

These moves mark a clear shift: knowledge graphs are no longer just metadata management tools—they’ve become the semantic backbone for AI and closer to their origins as expert systems. GraphRAG made knowledge graphs relevant again by giving them a critical role in retrieval, reasoning, and explainability.

In my day job as the product lead for a semantic data and AI company, we work to solve the gap between data and its actual meaning for some of the world’s biggest companies. Making their data AI-ready is a mix of making it interoperable, discoverable, and usable so it can feed LLMs contextually relevant information in order to produce safe, accurate results. This is no small order for large, highly regulated, and complex enterprises managing exponential amounts of data.

The fall of RAG and the rise of context engineering

Is RAG dead? No, but it has evolved. The original version of RAG relied on a single dense vector search and took the top results to feed directly into an LLM. GraphRAG built on this by adding in some graph analytics and entity and/or relationship filters. Those implementations almost immediately ran into constraints around relevance, scalability, and noise. These constraints pushed RAG forward into new evolutions known by many names: agentic retrieval, RAG 2.0, and most recently, context engineering. The original, naive implementation is largely dead, but its descendants are thriving and the term itself is still incredibly popular.

Following the RAG hype cycle in 2024, there was inevitable disillusionment. While it is possible to build a RAG demo in minutes, and many people did, getting your app to scale in an enterprise becomes quite a bit dicier. “People think that RAG is easy because you can build a nice RAG demo on a single document very quickly now and it will be pretty nice. But getting this to actually work at scale on real world data where you have enterprise constraints is a very different problem,” said Douwe Kiela of Contextual AI and one of the authors of the original RAG paper from Meta in 2020.

One issue with scaling a RAG app is the volume of data needed at retrieval time. “I think the trouble that people get into with it is scaling it up. It’s great on 100 documents, but now all of a sudden I have to go to 100,000 or 1,000,000 documents” says Rajiv Shah. But as LLMs matured, their context windows grew. The size of context windows was the original pain point that RAG was built to address, raising the question if RAG is still necessary or useful. As Dr. Sebastian Gehrmann from Bloomberg points out, “If I am able to just paste in more documents or more context, I don’t need to rely on as many tricks to narrow down the context window. I can just rely on the large language model. There is a tradeoff here though” he notes, “where longer context usually comes at a cost of significantly increased latency and cost.”

It isn’t just cost and latency that you risk by arbitrarily dumping more information into the context window, you can also degrade performance. RAG can improve responses from LLMs, provided the retrieved context is relevant to the initial prompt. If the context is not relevant, you can get worse results, something called “context poisoning” or “context clash”, where misleading or contradictory information contaminates the reasoning process. Even if you are retrieving relevant context, you can overwhelm the model with sheer volume, leading to “context confusion” or “context distraction.” While terminology varies, multiple studies show that model accuracy tends to decline beyond a certain context size. This was found in a Databricks paper back in August of 2024 and reinforced through recent research from Chroma, something they termed “context rot”. Drew Breuning’s post usefully categorizes these issues as distinct “context fails”.

To address the problem of overwhelming the model, or providing incorrect, or irrelevant information, re-rankers have grown in popularity. As Nikolaos Vasiloglou from RelationalAI states, “a re-ranker is, after you bring the facts, how do you decide what to keep and what to throw away, [and that] has a big impact.” Popular re-rankers are Cohere Rerank, Voyage AI Rerank, Jina Reranker, and BGE Reranker. Re-ranking is not enough in today’s agentic world. The newest generation of RAG has become embedded into agents–something increasingly known as context engineering.

What is Context Engineering?

“the art and science of filling the context window with just the right information at each step of an agent’s trajectory.” Lance Martin of LangChain.

I want to focus on context engineering for two reasons: the originators of the terms RAG 2.0 and Agentic Retrieval (Contextual AI and LlamaIndex, respectively) have started using the term context engineering; and it is a far more popular term based on Google search trends. Context engineering can also be thought of as an evolution of prompt engineering. Prompt engineering is about crafting a prompt in a way that gets you the results you want, context engineering is about supplementing that prompt with the appropriate context

RAG grew to prominence in 2023, eons ago in the timeline of AI. Since then, everything has become ‘agentic’. RAG was created under the assumption that the prompt would be generated by a human, and the response would be read by a human. With agents, we need to rethink how this works. Lance Martin breaks down context engineering into four categories: write, compress, isolate, and select. Agents need to write (or persist or remember) information from task to task, just like humans. Agents will often have too much context as they go from task to task and need to compress or condense it somehow, usually through summarization or ‘pruning’. Rather than giving all of the context to the model, we can isolate it or split it across agents so they can, as Anthropic describes it, “explore different parts of the problem simultaneously”. Rather than risk context rot and degraded results, the idea here is to not give the LLM enough rope to hang itself.

Agents have to use their memories when needed or call upon tools to retrieve additional information, i.e. they need to select (retrieve) what context to use. One of those tools could be vector-based retrieval i.e. traditional RAG. But that is just one tool in the agent’s toolbox. As Mark Brooker from AWS put it, “I do expect what we’re going to see is some of the flashy newness around vector kind of settle down and us go to a world where we have this new tool in our toolbox, but a lot of the agents we’re building are using relational interfaces. They’re using those document interfaces. They’re using lookup by primary key, lookup by secondary index. They’re using lookup by geo. All of these things that have existed in the database space for decades, now we also have this one more, which is kinda lookup by semantic meaning, which is very exciting and new and powerful.”

Those at the forefront are already doing this. Martin quotes Varun Mohan of Windsurf who says, “we […] rely on a combination of techniques like grep/file search, knowledge graph based retrieval, and … a re-ranking step where [context] is ranked in order of relevance.”

Naive RAG may be dead, and we are still figuring out what to call the modern implementations, but one thing seems certain: the future of retrieval is bright. How can we ensure agents are able to retrieve different datasets across an enterprise? From relational data to documents? The answer is increasingly being called the semantic layer.

Context engineering needs a semantic layer

What is a Semantic Layer?

A semantic layer is a way of attaching metadata to all data in a form that is both human and machine readable, so that people and computers can consistently understand, retrieve, and reason over it.

There is a recent push from those in the relational data world to build a semantic layer over relational data. Snowflake even created an Open Semantic Interchange (OSI) initiative to attempt to standardize the way companies are documenting their data to make it ready for AI.

But focusing solely on relational data is a narrow view of semantics. What about unstructured data and semi-structured data? That’s the kind of data that large language models excel at and what started all the RAG rage. If only there was a precedent for retrieving relevant search results across a ton of unstructured data 🤔.

Google has been retrieving relevant information across the entire internet for decades using structured data. By structured data, here, I mean machine-readable metadata, or as Google describes it, “a standardized format for providing information about a page and classifying the page content.” Librarians, information scientists, and SEO practitioners have been tackling the unstructured data retrieval problem through knowledge organization, information retrieval, structured metadata, and Semantic Web technologies. Their methods for describing, linking, and governing unstructured data underpin today’s search and discovery systems, both publicly and at the enterprise. The future of the semantic layer will bridge the relational and the structured data worlds by combining the rigor of relational data management with the contextual richness of library sciences and knowledge graphs.

The future of RAG

Here are my predictions for the future of RAG.

RAG will continue to evolve into more agentic patterns. This means that retrieval of context is just one part of a reasoning loop which also includes writing, compressing, and isolating context. Retrieval becomes an iterative process, rather than one-shot. Anthropic’s Model Context Protocol (MCP) treats retrieval as a tool that can be given via MCP to an agent. OpenAI offers File search as a tool that agents can call. LangChain’s agent framework LangGraph lets you build agents using a node and edge pattern (like a graph). In their quickstart guide here, you can see that retrieval (in this case a web search) is just one of the tools that the agent can be given to do its job. Here they list retrieval as one of the actions an agent or workflow can take. Wikidata also has an MCP that enables users to interact directly with public data.

Retrieval will broaden and include all kinds of data (aka multimodal retrieval): relational, content, and then images, audio, geodata, and video. LlamaIndex offers four ‘retrieval modes’: chunks, files_via_metadata, files_via_content, auto_routed. They also offer composite retrieval, allowing you to retrieve from multiple sources at once. Snowflake offers Cortex Search for content and Cortex Analyst for relational data. LangChain offers retrievers over relational data, graph data (Neo4j), lexical, and vector.

Retrieval will broaden to include metadata about tools themselves, as well as “memories”. Anthropic’s MCP standardized how agents call tools using a registry of tools i.e. tool metadata. OpenAI, LangChain, LlamaIndex, AWS Bedrock, Azure, Snowflake, and Databricks all have capabilities for managing tools, some via MCP directly, others via their own registries. On the memory side, both LlamaIndex and LangChain treat memories as retrievable data (short term and long term) that agents can query during workflows. Projects like Cognee push this further with dedicated, queryable agent memory.

Knowledge graphs will play a key role as a metadata layer between relational and unstructured data, replacing the narrow definition of semantic layer currently in use with a more robust metadata management framework. The market consolidation we’ve seen over the past couple years and described above, I believe, is an indication of the market’s growing acknowledgement that knowledge graphs and metadata management are going to be crucial as agents are asked to do more complicated tasks across enterprise data. Gartner’s May 2025 report “Pivot Your Data Engineering Discipline to Efficiently Support AI Use Cases,” recommends data engineering teams adopt semantic techniques (such as ontologies and knowledge graphs) to support AI use cases. Knowledge graphs, metadata management, and reference data management are already ubiquitous in large life sciences and financial services companies, largely because they are highly regulated and require fact-based, grounded data to power their AI initiatives. Other industries are going to start adopting the tried and true methods of semantic technology as their use cases become more mature and require explainable answers.

Evaluation metrics on context retrieval will gain popularity. Ragas, Databricks Mosaic AI Agent Evaluation, and TruLens all provide frameworks for evaluating RAG. Evidently offers open source libraries and instructional material on RAG evaluation. LangChain’s evaluation product LangSmith has a module focused on RAG. What is important is that these frameworks are not just evaluating the accuracy of the answer given the prompt, they evaluate context relevance and groundedness (how well the response is supported by the context). Some vendors are building out metrics to evaluate provenance (citations and sourcing) of the retrieved context, coverage (did we retrieve enough?) and freshness or recency.

Policy-as-code guardrails ensure retrieval respects access control, policies, regulations, and best practices. Snowflake and Databricks enable row level access control and column masking already. Policy engines like Open Policy Agent (OPA) and Oso are embedding access control into agentic workflows. As Dr. Sebastian Gehrmann of Bloomberg has found, “RAG is not necessarily safer,” and can introduce new governance risks. I expect the need for guardrails to grow to include more complicated governance rules (beyond access control), policy requirements, and best practices.

Conclusion

RAG was never the end goal, just the starting point. As we move into the agentic era, retrieval is evolving into a part of a full discipline: context engineering. Agents don’t just need to find documents; they need to understand which data, tools, and memories are relevant for each step in their reasoning. This understanding requires a semantic layer–a way to understand, retrieve, and govern over the entire enterprise. Knowledge graphs, ontologies, and semantic models will provide that connective tissue. The next generation of retrieval won’t just be about speed and accuracy; it will also be about explainability and trust. The future of RAG is not retrieval alone, but retrieval that’s context-aware, policy-aware, and semantically grounded.

About the author: Steve Hedden is the Head of Product Management at TopQuadrant, where he leads the strategy for TopBraid EDG, a platform for knowledge graph and metadata management. His work focuses on bridging enterprise data governance and AI through ontologies, taxonomies, and semantic technologies. Steve writes and speaks regularly about knowledge graphs, and the evolving role of semantics in AI systems.

Source link

#RAG #Dead #Rise #Context #Engineering #Semantic #Layers #Agentic