Data encryption is the primary method of protecting the privacy of consumer
device Internet communications from network observers. The ability to
automatically detect unencrypted data in network traffic is therefore an
essential tool for auditing Internet-connected devices. Existing methods
identify network packets containing cleartext but cannot differentiate packets
containing encrypted data from packets containing compressed unencrypted data,
which can be easily recovered by reversing the compression algorithm. This
makes it difficult for consumer protection advocates to identify devices that
risk user privacy by sending sensitive data in a compressed unencrypted format.
Here, we present the first technique to automatically distinguish encrypted
from compressed unencrypted network transmissions on a per-packet basis. We
apply three machine learning models and achieve a maximum 66.9% accuracy with a
convolutional neural network trained on raw packet data. This result is a
baseline for this previously unstudied machine learning problem, which we hope
will motivate further attention and accuracy improvements. To facilitate
continuing research on this topic, we have made our training and test datasets
available to the public.