Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

Abstract

Retrieval-augmented generation (RAG) has revitalized Large Language Models (LLMs) by injecting non-parametric factual knowledge. Compared with long-context LLMs, RAG is considered an effective summarization tool in a more concise and lightweight manner, which can interact with LLMs multiple times using diverse queries to get comprehensive responses. However, the LLM-generated historical responses, which contain potentially insightful information, are largely neglected and discarded by existing approaches, leading to suboptimal results. In this paper, we propose graph of records (GoR), which leverages historical responses generated by LLMs to enhance RAG for long-context global summarization. Inspired by the retrieve-then-generate paradigm of RAG, we construct a graph by creating an edge between the retrieved text chunks and the corresponding LLM-generated response. To further uncover the sophisticated correlations between them, GoR further features a graph neural network and an elaborately designed BERTScore-based objective for self-supervised model training, enabling seamless supervision signal backpropagation between reference summaries and node embeddings. We comprehensively compare GoR with 12 baselines on four long-context summarization datasets, and the results indicate that our proposed method reaches the best performance. Extensive experiments further demonstrate the effectiveness of GoR.

Graph of Records (GoR)

GoR randomly selects text chunks from long documents to feed into LLMs for query simulation, which are saved as a self-supervised training corpus and further used for graph construction inspired by the retrieve-then-generate paradigm in RAG. For model training, GoR leverages GNNs to obtain node embeddings and calculate their similarities to the query embedding. Finally, GoR features contrastive learning and pair-wise ranking objectives based on the node ranking list derived from BERTScore calculation.

GoR Model Architecture

Experiments

Experimental results on QMSum, AcademicEval, WCEP, and BookSum datasets over long-context global summarization tasks w.r.t. Rouge-L (R-L), Rouge-1 (R-1), and Rouge-2 (R-2).

Experimental Results

BibTeX

@article{GoR,
  author    = {Haozhen Zhang and Tao Feng and Jiaxuan You},
  title     = {Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs},
  journal   = {arXiv preprint arXiv:2410.11001},
  year      = {2024},
}