These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
We expose a surprising failure of generalization in auto-regressive large
language models (LLMs). If a model is trained on a sentence of the form "A is
B", it will not automatically generalize to the reverse direction "B is A".
This is the Reversal Curse. For instance, if a model is trained on "Valentina
Tereshkova was the first woman to travel to space", it will not automatically
be able to answer the question, "Who was the first woman to travel to space?".
Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not
be higher than for a random name. Thus, models do not generalize a prevalent
pattern in their training set: if "A is B" occurs, "B is A" is more likely to
occur. It is worth noting, however, that if "A is B" appears in-context, models
can deduce the reverse relationship. We provide evidence for the Reversal Curse
by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah
Hawthorne is the composer of Abyssal Melodies" and showing that they fail to
correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust
across model sizes and model families and is not alleviated by data
augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about
real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee
Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly
answers questions like the former 79% of the time, compared to 33% for the
latter.
Code available at: https://github.com/lukasberglund/reversal_curse.