The prosperous development of cloud computing and machine learning as a
service has led to the widespread use of media software to process confidential
media data. This paper explores an adversary's ability to launch side channel
analyses (SCA) against media software to reconstruct confidential media inputs.
Recent advances in representation learning and perceptual learning inspired us
to consider the reconstruction of media inputs from side channel traces as a
cross-modality manifold learning task that can be addressed in a unified manner
with an autoencoder framework trained to learn the mapping between media inputs
and side channel observations. We further enhance the autoencoder with
attention to localize the program points that make the primary contribution to
SCA, thus automatically pinpointing information-leakage points in media
software. We also propose a novel and highly effective defensive technique
called perception blinding that can perturb media inputs with perception masks
and mitigate manifold learning-based SCA.
Our evaluation exploits three popular media software to reconstruct inputs in
image, audio, and text formats. We analyze three common side channels - cache
bank, cache line, and page tables - and userspace-only cache set accesses
logged by standard Prime+Probe. Our framework successfully reconstructs
high-quality confidential inputs from the assessed media software and
automatically pinpoint their vulnerable program points, many of which are
unknown to the public. We further show that perception blinding can mitigate
manifold learning-based SCA with negligible extra cost.