Obsidian: Cooperative State-Space Exploration for Performant Inference on Secure ML Accelerators

TOP Literature Database Obsidian: Cooperative State-Space Exploration for Performant Inference on Secure ML Accelerators

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2409.02817

PDF

https://arxiv.org/pdf/2409.02817

Paper Information

Author: Sarbartha Banerjee;Shijia Wei;Prakash Ramrakhyani;Mohit Tiwari
Published: 9-5-2024
Affiliation: The University of Texas at Austin
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Memory Management Method Optimization Problem Energy Management

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Trusted execution environments (TEEs) for machine learning accelerators are indispensable in secure and efficient ML inference. Optimizing workloads through state-space exploration for the accelerator architectures improves performance and energy consumption. However, such explorations are expensive and slow due to the large search space. Current research has to use fast analytical models that forego critical hardware details and cross-layer opportunities unique to the hardware security primitives. While cycle-accurate models can theoretically reach better designs, their high runtime cost restricts them to a smaller state space. We present Obsidian, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator. Obsidian addresses the above challenge by exploring the state space using analytical and cycle-accurate models cooperatively. The two main exploration components include: (1) A secure accelerator analytical model, that includes the effect of secure hardware while traversing the large mapping state space and produce the best m model mappings; (2) A compiler profiling step on a cycle-accurate model, that captures runtime bottlenecks to further improve execution runtime, energy and resource utilization and find the optimal model mapping. We compare our results to a baseline secure accelerator, comprising of the state-of-the-art security schemes obtained from guardnn [ 33 ] and sesame [11]. The analytical model reduces the inference latency by 20.5% for a cloud and 8.4% for an edge deployment with an energy improvement of 24% and 19% respectively. The cycle-accurate model, further reduces the latency by 9.1% for a cloud and 12.2% for an edge with an energy improvement of 13.8% and 13.1%.