Outsourced deep neural networks have been demonstrated to suffer from
patch-based trojan attacks, in which an adversary poisons the training sets to
inject a backdoor in the obtained model so that regular inputs can be still
labeled correctly while those carrying a specific trigger are falsely given a
target label. Due to the severity of such attacks, many backdoor detection and
containment systems have recently, been proposed for deep neural networks. One
major category among them are various model inspection schemes, which hope to
detect backdoors before deploying models from non-trusted third-parties. In
this paper, we show that such state-of-the-art schemes can be defeated by a
so-called Scapegoat Backdoor Attack, which introduces a benign scapegoat
trigger in data poisoning to prevent the defender from reversing the real
abnormal trigger. In addition, it confines the values of network parameters
within the same variances of those from clean model during training, which
further significantly enhances the difficulty of the defender to learn the
differences between legal and illegal models through machine-learning
approaches. Our experiments on 3 popular datasets show that it can escape
detection by all five state-of-the-art model inspection schemes. Moreover, this
attack brings almost no side-effects on the attack effectiveness and guarantees
the universal feature of the trigger compared with original patch-based trojan
attacks.