Hardware Trojans (HTs) have become a serious problem, and extermination of
them is strongly required for enhancing the security and safety of integrated
circuits. An effective solution is to identify HTs at the gate level via
machine learning techniques. However, machine learning has specific
vulnerabilities, such as adversarial examples. In reality, it has been reported
that adversarial modified HTs greatly degrade the performance of a machine
learning-based HT detection method. Therefore, we propose a robust HT detection
method using adversarial training (R-HTDetector). We formally describe the
robustness of R-HTDetector in modifying HTs. Our work gives the world-first
adversarial training for HT detection with theoretical backgrounds. We show
through experiments with Trust-HUB benchmarks that R-HTDetector overcomes
adversarial examples while maintaining its original accuracy.