Binary code authorship identification determines authors of a binary program.
Existing techniques have used supervised machine learning for this task. In
this paper, we look this problem from an attacker's perspective. We aim to
modify a test binary, such that it not only causes misprediction but also
maintains the functionality of the original input binary. Attacks against
binary code are intrinsically more difficult than attacks against domains such
as computer vision, where attackers can change each pixel of the input image
independently and still maintain a valid image. For binary code, even flipping
one bit of a binary may cause the binary to be invalid, to crash at the
run-time, or to lose the original functionality. We investigate two types of
attacks: untargeted attacks, causing misprediction to any of the incorrect
authors, and targeted attacks, causing misprediction to a specific one among
the incorrect authors. We develop two key attack capabilities: feature vector
modification, generating an adversarial feature vector that both corresponds to
a real binary and causes the required misprediction, and input binary
modification, modifying the input binary to match the adversarial feature
vector while maintaining the functionality of the input binary. We evaluated
our attack against classifiers trained with a state-of-the-art method for
authorship attribution. The classifiers for authorship identification have 91%
accuracy on average. Our untargeted attack has a 96% success rate on average,
showing that we can effectively suppress authorship signal. Our targeted attack
has a 46% success rate on average, showing that it is possible, but
significantly more difficult to impersonate a specific programmer's style. Our
attack reveals that existing binary code authorship identification techniques
rely on code features that are easy to modify, and thus are vulnerable to
attacks.