In the era of large language models (LLMs), hallucination (i.e., the tendency
to generate factually incorrect content) poses great challenge to trustworthy
and reliable deployment of LLMs in real-world applications. To tackle the LLM
hallucination, three key questions should be well studied: how to detect
hallucinations (detection), why do LLMs hallucinate (source), and what can be
done to mitigate them (mitigation). To address these challenges, this work
presents a systematic empirical study on LLM hallucination, focused on the the
three aspects of hallucination detection, source and mitigation. Specially, we
construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet
effective detection method for LLM hallucination. Furthermore, we zoom into the
different training or utilization stages of LLMs and extensively analyze the
potential factors that lead to the LLM hallucination. Finally, we implement and
examine a series of widely used techniques to mitigate the hallucinations in
LLMs. Our work has led to several important findings to understand the
hallucination origin and mitigate the hallucinations in LLMs. Our code and data
can be accessed at https://github.com/RUCAIBox/HaluEval-2.0.