These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Advances in image tampering pose serious security threats, underscoring the
need for effective image manipulation localization (IML). While supervised IML
achieves strong performance, it depends on costly pixel-level annotations.
Existing weakly supervised or training-free alternatives often underperform and
lack interpretability. We propose the In-Context Forensic Chain (ICFC), a
training-free framework that leverages multi-modal large language models
(MLLMs) for interpretable IML tasks. ICFC integrates an objectified rule
construction with adaptive filtering to build a reliable knowledge base and a
multi-step progressive reasoning pipeline that mirrors expert forensic
workflows from coarse proposals to fine-grained forensics results. This design
enables systematic exploitation of MLLM reasoning for image-level
classification, pixel-level localization, and text-level interpretability.
Across multiple benchmarks, ICFC not only surpasses state-of-the-art
training-free methods but also achieves competitive or superior performance
compared to weakly and fully supervised approaches.