These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
This paper presents an approach to developing assurance cases for adversarial
robustness and regulatory compliance in large language models (LLMs). Focusing
on both natural and code language tasks, we explore the vulnerabilities these
models face, including adversarial attacks based on jailbreaking, heuristics,
and randomization. We propose a layered framework incorporating guardrails at
various stages of LLM deployment, aimed at mitigating these attacks and
ensuring compliance with the EU AI Act. Our approach includes a meta-layer for
dynamic risk management and reasoning, crucial for addressing the evolving
nature of LLM vulnerabilities. We illustrate our method with two exemplary
assurance cases, highlighting how different contexts demand tailored strategies
to ensure robust and compliant AI systems.