RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

TOP Literature Database RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2206.02829

PDF

https://arxiv.org/pdf/2206.02829

Paper Information

Author: Rui Yang;Chenjia Bai;Xiaoteng Ma;Zhaoran Wang;Chongjie Zhang;Lei Han
Published: 6-7-2022
Updated: 10-22-2022
Affiliation: Hong Kong University of Science and Technology
Country: Hong Kong
Conference

Labels Estimated by AI

Reinforcement Learning Environment Uncertainty Assessment Robustness

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

External Datasets

D4RL