Machine learning algorithms, however effective, are known to be vulnerable in
adversarial scenarios where a malicious user may inject manipulated instances.
In this work we focus on evasion attacks, where a model is trained in a safe
environment and exposed to attacks at test time. The attacker aims at finding a
minimal perturbation of a test instance that changes the model outcome.
We propose a model-agnostic strategy that builds a robust ensemble by
training its basic models on feature-based partitions of the given dataset. Our
algorithm guarantees that the majority of the models in the ensemble cannot be
affected by the attacker. We experimented the proposed strategy on decision
tree ensembles, and we also propose an approximate certification method for
tree ensembles that efficiently assess the minimal accuracy of a forest on a
given dataset avoiding the costly computation of evasion attacks.
Experimental evaluation on publicly available datasets shows that proposed
strategy outperforms state-of-the-art adversarial learning algorithms against
evasion attacks.