A General Theoretical Paradigm to Understand Learning from Human Preferences

TOP 文献データベース A General Theoretical Paradigm to Understand Learning from Human Preferences

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2310.12036

PDF

https://arxiv.org/pdf/2310.12036

文献情報

作者: Mohammad Gheshlaghi Azar,Mark Rowland,Bilal Piot,Daniel Guo,Daniele Calandriello,Michal Valko,Rémi Munos
公開日: 2023-10-19
更新日: 2023-11-22
所属機関: Google DeepMind
所属の国: United Kingdom
会議名: International Conference on Artificial Intelligence and Statistics (AISTATS)

AIにより推定されたラベル

深層学習アライメントデータ生成手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

The prevalent deployment of learning from human preferences through reinforcement learning (RLHF) relies on two important approximations: the first assumes that pairwise preferences can be substituted with pointwise rewards. The second assumes that a reward model trained on these pointwise rewards can generalize from collected data to out-of-distribution data sampled by the policy. Recently, Direct Preference Optimisation (DPO) has been proposed as an approach that bypasses the second approximation and learn directly a policy from collected data without the reward modelling stage. However, this method still heavily relies on the first approximation. In this paper we try to gain a deeper theoretical understanding of these practical algorithms. In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations. This new general objective allows us to perform an in-depth analysis of the behavior of RLHF and DPO (as special cases of $\Psi$PO) and to identify their potential pitfalls. We then consider another special case for $\Psi$PO by setting $\Psi$ simply to Identity, for which we can derive an efficient optimisation procedure, prove performance guarantees and demonstrate its empirical superiority to DPO on some illustrative examples.

外部データセット