We introduce a form of steganography in the domain of machine learning which
we call training set camouflage. Imagine Alice has a training set on an illicit
machine learning classification task. Alice wants Bob (a machine learning
system) to learn the task. However, sending either the training set or the
trained model to Bob can raise suspicion if the communication is monitored.
Training set camouflage allows Alice to compute a second training set on a
completely different -- and seemingly benign -- classification task. By
construction, sending the second training set will not raise suspicion. When
Bob applies his standard (public) learning algorithm to the second training
set, he approximately recovers the classifier on the original task. Training
set camouflage is a novel form of steganography in machine learning. We
formulate training set camouflage as a combinatorial bilevel optimization
problem and propose solvers based on nonlinear programming and local search.
Experiments on real classification tasks demonstrate the feasibility of such
camouflage.