RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines

Authors: Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu | Published: 2025-10-23

2025.10.23

Authors: Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu
Published: 2025-10-23

Source: https://arxiv.org/abs/2510.20768

PDF: https://arxiv.org/pdf/2510.20768

AIにより推定されたラベル

RAG RAGへのポイズニング攻撃サイバー脅威インテリジェンス

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Retrieval-Augmented Generation (RAG) has emerged as the dominant architectural pattern to operationalize Large Language Model (LLM) usage in Cyber Threat Intelligence (CTI) systems. However, this design is susceptible to poisoning attacks, and previously proposed defenses can fail for CTI contexts as cyber threat information is often completely new for emerging attacks, and sophisticated threat actors can mimic legitimate formats, terminology, and stylistic conventions. To address this issue, we propose that the robustness of modern RAG defenses can be accelerated by applying source credibility algorithms on corpora, using PageRank as an example. In our experiments, we demonstrate quantitatively that our algorithm applies a lower authority score to malicious documents while promoting trusted content, using the standardized MS MARCO dataset. We also demonstrate proof-of-concept performance of our algorithm on CTI documents and feeds.