Graph Retrieval-Augmented Generation for Large Language Models: A Survey

Procko, Tyler; Ochoa, Omar

Download This Paper

Open PDF in Browser

Add Paper to My Library

Graph Retrieval-Augmented Generation for Large Language Models: A Survey

4 Pages Posted: 15 Aug 2024

See all articles by Tyler Procko

Omar Ochoa

Embry-Riddle Aeronautical University

Date Written: July 13, 2024

Abstract

Large Language Models (LLMs) demonstrate general knowledge, but they suffer when specifically needed knowledge is not present in their training set. Two approaches to ameliorating this, without retraining , are 1) prompt engineering and 2) Retrieval-Augmented Generation (RAG). RAG is a form of prompt engineering, insofar as relevant lexical snippets retrieved from RAG corpora are vectorized and aggregated with prompts. However, RAG documents are often noisy, i.e., while relevant to a given prompt, they can contain much other information that obfuscates the desired snippet. If the purpose of pre-training a LLM on massive and general corpora is to engender a generally applicable model, RAG is not: it is a means of LLM optimization, and as such, RAG document selection must be precise, not general. For expert tasks, it is imperative that a RAG corpus be as noise-free as possible, in much the same way a good prompt should be free of irrelevant text. Knowledge Graphs (KGs) provide a concise means of representing domain knowledge free of noisy information. This paper surveys work incorporating KGs with LLM RAG, intending to equip scientists with a better understanding of this novel research area for future work.

Keywords: LLM, GPT, fine-tuning, knowledge graphs, RAG

Suggested Citation: Suggested Citation

Procko, Tyler and Ochoa, Omar, Graph Retrieval-Augmented Generation for Large Language Models: A Survey (July 13, 2024). Available at SSRN: https://ssrn.com/abstract=4895062