The growing prospect of deep reinforcement learning (DRL) being used in
cyber-physical systems has raised concerns around safety and robustness of
autonomous agents. Recent work on generating adversarial attacks have shown
that it is computationally feasible for a bad actor to fool a DRL policy into
behaving sub optimally. Although certain adversarial attacks with specific
attack models have been addressed, most studies are only interested in off-line
optimization in the data space (e.g., example fitting, distillation). This
paper introduces a Meta-Learned Advantage Hierarchy (MLAH) framework that is
attack model-agnostic and more suited to reinforcement learning, via handling
the attacks in the decision space (as opposed to data space) and directly
mitigating learned bias introduced by the adversary. In MLAH, we learn separate
sub-policies (nominal and adversarial) in an online manner, as guided by a
supervisory master agent that detects the presence of the adversary by
leveraging the advantage function for the sub-policies. We demonstrate that the
proposed algorithm enables policy learning with significantly lower bias as
compared to the state-of-the-art policy learning approaches even in the
presence of heavy state information attacks. We present algorithm analysis and
simulation results using popular OpenAI Gym environments.