Recent work on adversarial learning has focused mainly on neural networks and
domains where those networks excel, such as computer vision, or audio
processing. The data in these domains is typically homogeneous, whereas
heterogeneous tabular datasets domains remain underexplored despite their
prevalence. When searching for adversarial patterns within heterogeneous input
spaces, an attacker must simultaneously preserve the complex domain-specific
validity rules of the data, as well as the adversarial nature of the identified
samples. As such, applying adversarial manipulations to heterogeneous datasets
has proved to be a challenging task, and no generic attack method was suggested
thus far. We, however, argue that machine learning models trained on
heterogeneous tabular data are as susceptible to adversarial manipulations as
those trained on continuous or homogeneous data such as images. To support our
claim, we introduce a generic optimization framework for identifying
adversarial perturbations in heterogeneous input spaces. We define
distribution-aware constraints for preserving the consistency of the
adversarial examples and incorporate them by embedding the heterogeneous input
into a continuous latent space. Due to the nature of the underlying datasets We
focus on $\ell_0$ perturbations, and demonstrate their applicability in real
life. We demonstrate the effectiveness of our approach using three datasets
from different content domains. Our results demonstrate that despite the
constraints imposed on input validity in heterogeneous datasets, machine
learning models trained using such data are still equally susceptible to
adversarial examples.