Robustness Analysis

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

Authors: Hoagy Cunningham, Jerry Wei, Zihan Wang, Andrew Persic, Alwin Peng, Jordan Abderrachid, Raj Agarwal, Bobby Chen, Austin Cohen, Andy Dau, Alek Dimitriev, Rob Gilson, Logan Howard, Yijin Hua, Jared Kaplan, Jan Leike, Mu Lin, Christopher Liu, Vladimir Mikulik, Rohit Mittapalli, Clare O'Hara, Jin Pan, Nikhil Saxena, Alex Silverstein, Yue Song, Xunjie Yu, Giulio Zhou, Ethan Perez, Mrinank Sharma | Published: 2026-01-08
Prompt Injection
Robustness Analysis
Robustness of Deep Networks

Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression

Authors: Weiyi He, Yue Xing | Published: 2025-12-10
エラー解析
Robustness Analysis
一般化境界

Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems

Authors: Mostafa Jafari, Alireza Shameli-Sendi | Published: 2025-05-14
Robustness Analysis
Attack Detection Method
Adversarial Learning

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models

Authors: Amirhossein Dabiriaghdam, Lele Wang | Published: 2025-02-05 | Updated: 2025-09-11
Robustness Analysis
Digital Watermarking for Generative AI
Watermark Design

PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking

Authors: Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, Prateek Mittal | Published: 2020-05-17 | Updated: 2021-03-31
Robustness Analysis
Adversarial attack
Feature Extraction Method

Improved Image Wasserstein Attacks and Defenses

Authors: Edward J. Hu, Adith Swaminathan, Hadi Salman, Greg Yang | Published: 2020-04-26 | Updated: 2023-05-09
Robustness Analysis
Adversarial Example
Adversarial Attack Methods

Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers

Authors: Loc Truong, Chace Jones, Brian Hutchinson, Andrew August, Brenda Praggastis, Robert Jasper, Nicole Nichols, Aaron Tuor | Published: 2020-04-24
Backdoor Attack
Robustness Analysis
Regularization

How to compare adversarial robustness of classifiers from a global perspective

Authors: Niklas Risse, Christina Göpfert, Jan Philip Göpfert | Published: 2020-04-22 | Updated: 2020-10-15
Poisoning
Robustness Analysis
Evaluation Method

Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method

Authors: J. Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro | Published: 2020-04-03
3D Feature Extraction
Robustness Analysis
Face Recognition

A simple way to make neural networks robust against diverse image corruptions

Authors: Evgenia Rusak, Lukas Schott, Roland S. Zimmermann, Julian Bitterwolf, Oliver Bringmann, Matthias Bethge, Wieland Brendel | Published: 2020-01-16 | Updated: 2020-07-22
Robustness Analysis
Convergence analysis
Adversarial Learning