Existing neural network-based autonomous systems are shown to be vulnerable
against adversarial attacks, therefore sophisticated evaluation on their
robustness is of great importance. However, evaluating the robustness only
under the worst-case scenarios based on known attacks is not comprehensive, not
to mention that some of them even rarely occur in the real world. In addition,
the distribution of safety-critical data is usually multimodal, while most
traditional attacks and evaluation methods focus on a single modality. To solve
the above challenges, we propose a flow-based multimodal safety-critical
scenario generator for evaluating decisionmaking algorithms. The proposed
generative model is optimized with weighted likelihood maximization and a
gradient-based sampling procedure is integrated to improve the sampling
efficiency. The safety-critical scenarios are generated by querying the task
algorithms and the log-likelihood of the generated scenarios is in proportion
to the risk level. Experiments on a self-driving task demonstrate our
advantages in terms of testing efficiency and multimodal modeling capability.
We evaluate six Reinforcement Learning algorithms with our generated traffic
scenarios and provide empirical conclusions about their robustness.