These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large language models are successful in answering factoid questions but are
also prone to hallucination. We investigate the phenomenon of LLMs possessing
correct answer knowledge yet still hallucinating from the perspective of
inference dynamics, an area not previously covered in studies on
hallucinations. We are able to conduct this analysis via two key ideas. First,
we identify the factual questions that query the same triplet knowledge but
result in different answers. The difference between the model behaviors on the
correct and incorrect outputs hence suggests the patterns when hallucinations
happen. Second, to measure the pattern, we utilize mappings from the residual
streams to vocabulary space. We reveal the different dynamics of the output
token probabilities along the depths of layers between the correct and
hallucinated cases. In hallucinated cases, the output token's information
rarely demonstrates abrupt increases and consistent superiority in the later
stages of the model. Leveraging the dynamic curve as a feature, we build a
classifier capable of accurately detecting hallucinatory predictions with an
88\% success rate. Our study shed light on understanding the reasons for LLMs'
hallucinations on their known facts, and more importantly, on accurately
predicting when they are hallucinating.