Learned-Database Systems Security

TOP Literature Database Learned-Database Systems Security

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2212.10318

PDF

https://arxiv.org/pdf/2212.10318

Paper Information

Author: Roei Schuster,Jin Peng Zhou,Thorsten Eisenhofer,Paul Grubbs,Nicolas Papernot
Published: 12-21-2022
Updated: 7-2-2025
Affiliation: Context AI
Country: United States of America
Conference: Trans. Mach. Learn. Res.

Labels Estimated by AI

Backdoor Attack Poisoning Privacy Enhancing Technology

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

A learned database system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-based systems, attackers face a level of indirection as they cannot interact directly with the learned model. Additionally, the difference between the attack surface of learned and non-learned versions of the same system is often subtle. These factors obfuscate the de-facto risks that the incorporation of ML carries. We analyze the root causes of potentially-increased attack surface in learned database systems and develop a framework for identifying vulnerabilities that stem from the use of ML. We apply our framework to a broad set of learned components currently being explored in the database community. To empirically validate the vulnerabilities surfaced by our framework, we choose 3 of them and implement and evaluate exploits against these. We show that the use of ML cause leakage of past queries in a database, enable a poisoning attack that causes exponential memory blowup in an index structure and crashes it in seconds, and enable index users to snoop on each others' key distributions by timing queries over their own keys. We find that adversarial ML is an universal threat against learned components in database systems, point to open research gaps in our understanding of learned-systems security, and conclude by discussing mitigations, while noting that data leakage is inherent in systems whose learned component is shared between multiple parties.

External Datasets

Cardinality Estimation Benchmark (CEB)

YCSB dataset

longitudes dataset