Policy Poisoning in Batch Reinforcement Learning and Control

TOP 文献データベース Policy Poisoning in Batch Reinforcement Learning and Control

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1910.05821

PDF

https://arxiv.org/pdf/1910.05821

文献情報

作者: Yuzhe Ma,Xuezhou Zhang,Wen Sun,Xiaojin Zhu
公開日: 2019-10-14
更新日: 2019-10-31
所属機関: University of Wisconsin–Madison
所属の国: United States of America
会議名: Conference on Neural Information Processing Systems (NeurIPS)

AIにより推定されたラベル

強化学習環境攻撃の評価攻撃者や悪意のあるデバイス

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy. The victim is a reinforcement learner / controller which first estimates the dynamics and the rewards from a batch data set, and then solves for the optimal policy with respect to the estimates. The attacker can modify the data set slightly before learning happens, and wants to force the learner into learning a target policy chosen by the attacker. We present a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control. We show that both instantiation result in a convex optimization problem on which global optimality is guaranteed, and provide analysis on attack feasibility and attack cost. Experiments show the effectiveness of policy poisoning attacks.