These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The goal of Domain Generation Algorithm (DGA) detection is to recognize
infections with bot malware and is often done with help of Machine Learning
approaches that classify non-resolving Domain Name System (DNS) traffic and are
trained on possibly sensitive data. In parallel, the rise of privacy research
in the Machine Learning world leads to privacy-preserving measures that are
tightly coupled with a deep learning model's architecture or training routine,
while non deep learning approaches are commonly better suited for the
application of privacy-enhancing methods outside the actual classification
module. In this work, we aim to measure the privacy capability of the feature
extractor of feature-based DGA detector FANCI (Feature-based Automated Nxdomain
Classification and Intelligence). Our goal is to assess whether a data-rich
adversary can learn an inverse mapping of FANCI's feature extractor and thereby
reconstruct domain names from feature vectors. Attack success would pose a
privacy threat to sharing FANCI's feature representation, while the opposite
would enable this representation to be shared without privacy concerns. Using
three real-world data sets, we train a recurrent Machine Learning model on the
reconstruction task. Our approaches result in poor reconstruction performance
and we attempt to back our findings with a mathematical review of the feature
extraction process. We thus reckon that sharing FANCI's feature representation
does not constitute a considerable privacy leakage.