With the increasing adoption of AI, inherent security and privacy
vulnerabilities formachine learning systems are being discovered. One such
vulnerability makes itpossible for an adversary to obtain private information
about the types of instancesused to train the targeted machine learning model.
This so-called model inversionattack is based on sequential leveraging of
classification scores towards obtaininghigh confidence representations for
various classes. However, for deep networks,such procedures usually lead to
unrecognizable representations that are uselessfor the adversary. In this
paper, we introduce a more realistic definition of modelinversion, where the
adversary is aware of the general purpose of the attackedmodel (for instance,
whether it is an OCR system or a facial recognition system),and the goal is to
find realistic class representations within the corresponding lower-dimensional
manifold (of, respectively, general symbols or general faces). To thatend, we
leverage properties of generative adversarial networks for constructinga
connected lower-dimensional manifold, and demonstrate the efficiency of
ourmodel inversion attack that is carried out within that manifold.