Paper Information
- Author
- Yanting Wang,Wei Zou,Runpeng Geng,Jinyuan Jia
- Published
- 6-5-2025
- Updated
- 6-27-2025
- Affiliation
- Pennsylvania State University
- Country
- United States of America
- Conference
- Computing Research Repository (CoRR)
Abstract
Long context large language models (LLMs) are deployed in many real-world
applications such as RAG, agent, and broad LLM-integrated applications. Given
an instruction and a long context (e.g., documents, PDF files, webpages), a
long context LLM can generate an output grounded in the provided context,
aiming to provide more accurate, up-to-date, and verifiable outputs while
reducing hallucinations and unsupported claims. This raises a research
question: how to pinpoint the texts (e.g., sentences, passages, or paragraphs)
in the context that contribute most to or are responsible for the generated
output by an LLM? This process, which we call context traceback, has various
real-world applications, such as 1) debugging LLM-based systems, 2) conducting
post-attack forensic analysis for attacks (e.g., prompt injection attack,
knowledge corruption attacks) to an LLM, and 3) highlighting knowledge sources
to enhance the trust of users towards outputs generated by LLMs. When applied
to context traceback for long context LLMs, existing feature attribution
methods such as Shapley have sub-optimal performance and/or incur a large
computational cost. In this work, we develop TracLLM, the first generic context
traceback framework tailored to long context LLMs. Our framework can improve
the effectiveness and efficiency of existing feature attribution methods. To
improve the efficiency, we develop an informed search based algorithm in
TracLLM. We also develop contribution score ensemble/denoising techniques to
improve the accuracy of TracLLM. Our evaluation results show TracLLM can
effectively identify texts in a long context that lead to the output of an LLM.
Our code and data are at: https://github.com/Wang-Yanting/TracLLM.