This paper reveals a data bias issue that can severely affect the performance
while conducting a machine learning model for malicious URL detection. We
describe how such bias can be identified using interpretable machine learning
techniques, and further argue that such biases naturally exist in the real
world security data for training a classification model. We then propose a
debiased training strategy that can be applied to most deep-learning based
models to alleviate the negative effects from the biased features. The solution
is based on the technique of self-supervised adversarial training to train deep
neural networks learning invariant embedding from biased data. We conduct a
wide range of experiments to demonstrate that the proposed strategy can lead to
significantly better generalization capability for both CNN-based and RNN-based
detection models.