These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
A learned database system uses machine learning (ML) internally to improve
performance. We can expect such systems to be vulnerable to some adversarial-ML
attacks. Often, the learned component is shared between mutually-distrusting
users or processes, much like microarchitectural resources such as caches,
potentially giving rise to highly-realistic attacker models. However, compared
to attacks on other ML-based systems, attackers face a level of indirection as
they cannot interact directly with the learned model. Additionally, the
difference between the attack surface of learned and non-learned versions of
the same system is often subtle. These factors obfuscate the de-facto risks
that the incorporation of ML carries. We analyze the root causes of
potentially-increased attack surface in learned database systems and develop a
framework for identifying vulnerabilities that stem from the use of ML. We
apply our framework to a broad set of learned components currently being
explored in the database community. To empirically validate the vulnerabilities
surfaced by our framework, we choose 3 of them and implement and evaluate
exploits against these. We show that the use of ML cause leakage of past
queries in a database, enable a poisoning attack that causes exponential memory
blowup in an index structure and crashes it in seconds, and enable index users
to snoop on each others' key distributions by timing queries over their own
keys. We find that adversarial ML is an universal threat against learned
components in database systems, point to open research gaps in our
understanding of learned-systems security, and conclude by discussing
mitigations, while noting that data leakage is inherent in systems whose
learned component is shared between multiple parties.