OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

TOP Literature Database OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2510.19169

PDF

https://arxiv.org/pdf/2510.19169

Paper Information

Author: Thomas Wang,Haowen Li
Published: 10-22-2025
Updated: 10-29-2025
Affiliation: OpenGuardrails.com
Country: United States of America
Conference

Labels Estimated by AI

Platform Architecture 動的ポリシー適応(Fail to translate) Author Contribution

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

As large language models (LLMs) are increasingly integrated into real-world applications, ensuring their safety, robustness, and privacy compliance has become critical. We present OpenGuardrails, the first fully open-source platform that unifies large-model-based safety detection, manipulation defense, and deployable guardrail infrastructure. OpenGuardrails protects against three major classes of risks: (1) content-safety violations such as harmful or explicit text generation, (2) model-manipulation attacks including prompt injection, jailbreaks, and code-interpreter abuse, and (3) data leakage involving sensitive or private information. Unlike prior modular or rule-based frameworks, OpenGuardrails introduces three core innovations: (1) a Configurable Policy Adaptation mechanism that allows per-request customization of unsafe categories and sensitivity thresholds; (2) a Unified LLM-based Guard Architecture that performs both content-safety and manipulation detection within a single model; and (3) a Quantized, Scalable Model Design that compresses a 14B dense base model to 3.3B via GPTQ while preserving over 98 of benchmark accuracy. The system supports 119 languages, achieves state-of-the-art performance across multilingual safety benchmarks, and can be deployed as a secure gateway or API-based service for enterprise use. All models, datasets, and deployment scripts are released under the Apache 2.0 license.