These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As large language models (LLMs) are increasingly integrated into real-world
applications, ensuring their safety, robustness, and privacy compliance has
become critical. We present OpenGuardrails, the first fully open-source
platform that unifies large-model-based safety detection, manipulation defense,
and deployable guardrail infrastructure. OpenGuardrails protects against three
major classes of risks: (1) content-safety violations such as harmful or
explicit text generation, (2) model-manipulation attacks including prompt
injection, jailbreaks, and code-interpreter abuse, and (3) data leakage
involving sensitive or private information. Unlike prior modular or rule-based
frameworks, OpenGuardrails introduces three core innovations: (1) a
Configurable Policy Adaptation mechanism that allows per-request customization
of unsafe categories and sensitivity thresholds; (2) a Unified LLM-based Guard
Architecture that performs both content-safety and manipulation detection
within a single model; and (3) a Quantized, Scalable Model Design that
compresses a 14B dense base model to 3.3B via GPTQ while preserving over 98 of
benchmark accuracy. The system supports 119 languages, achieves
state-of-the-art performance across multilingual safety benchmarks, and can be
deployed as a secure gateway or API-based service for enterprise use. All
models, datasets, and deployment scripts are released under the Apache 2.0
license.