Privacy-Preserving Reinforcement Learning Beyond Expectation

TOP Literature Database Privacy-Preserving Reinforcement Learning Beyond Expectation

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2203.10165

PDF

https://arxiv.org/pdf/2203.10165

Paper Information

Author: Arezoo Rajabi;Bhaskar Ramasubramanian;Abdullah Al Maruf;Radha Poovendran
Published: 3-19-2022
Affiliation: Network Security Lab, Department of Electrical and Computer Engineering, University of Washington
Country: United States of America
Conference: IEEE Conference on Decision and Control (CDC)

Labels Estimated by AI

Privacy Assessment Risk Assessment Method Reinforcement Learning Algorithm

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans. In such a setting, it is important to align system (or agent) behaviors with the preferences of one or more human users. We consider the case when an agent has to learn behaviors in an unknown environment. Our goal is to capture two defining characteristics of humans: i) a tendency to assess and quantify risk, and ii) a desire to keep decision making hidden from external parties. We incorporate cumulative prospect theory (CPT) into the objective of a reinforcement learning (RL) problem for the former. For the latter, we use differential privacy. We design an algorithm to enable an RL agent to learn policies to maximize a CPT-based objective in a privacy-preserving manner and establish guarantees on the privacy of value functions learned by the algorithm when rewards are sufficiently close. This is accomplished through adding a calibrated noise using a Gaussian process mechanism at each step. Through empirical evaluations, we highlight a privacy-utility tradeoff and demonstrate that the RL agent is able to learn behaviors that are aligned with that of a human user in the same environment in a privacy-preserving manner