Reinforcement learning (RL) is a machine learning paradigm where an
autonomous agent learns to make an optimal sequence of decisions by interacting
with the underlying environment. The promise demonstrated by RL-guided
workflows in unraveling electronic design automation problems has encouraged
hardware security researchers to utilize autonomous RL agents in solving
domain-specific problems. From the perspective of hardware security, such
autonomous agents are appealing as they can generate optimal actions in an
unknown adversarial environment. On the other hand, the continued globalization
of the integrated circuit supply chain has forced chip fabrication to
off-shore, untrustworthy entities, leading to increased concerns about the
security of the hardware. Furthermore, the unknown adversarial environment and
increasing design complexity make it challenging for defenders to detect subtle
modifications made by attackers (a.k.a. hardware Trojans). In this brief, we
outline the development of RL agents in detecting hardware Trojans, one of the
most challenging hardware security problems. Additionally, we outline potential
opportunities and enlist the challenges of applying RL to solve hardware
security problems.