Fail-Closed Alignment for Large Language Models Authors: Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong | Published: 2026-02-19 Prompt InjectionRobustness EvaluationDefense Method 2026.02.19 2026.02.21 Literature Database
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information Authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan | Published: 2023-11-20 | Updated: 2024-02-18 Prompt InjectionPrompt validationRobustness Evaluation 2023.11.20 2025.05.28 Literature Database
Instability of computer vision models is a necessary result of the task itself Authors: Oliver Turnbull, George Cevora | Published: 2023-10-26 Robustness EvaluationAdversarial ExampleDimensionality Reduction Method 2023.10.26 2025.05.28 Literature Database
Attesting Distributional Properties of Training Data for Machine Learning Authors: Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas Schneider, N. Asokan | Published: 2023-08-18 | Updated: 2024-04-09 Security AssuranceModel Performance EvaluationRobustness Evaluation 2023.08.18 2025.05.28 Literature Database
Robustness Over Time: Understanding Adversarial Examples’ Effectiveness on Longitudinal Versions of Large Language Models Authors: Yugeng Liu, Tianshuo Cong, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang | Published: 2023-08-15 | Updated: 2024-05-06 Prompt InjectionModel Performance EvaluationRobustness Evaluation 2023.08.15 2025.05.28 Literature Database
Robust Ranking Explanations Authors: Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie | Published: 2023-07-08 Robustness EvaluationThreat modelingExplainability 2023.07.08 2025.05.28 Literature Database
[Re] Double Sampling Randomized Smoothing Authors: Aryan Gupta, Sarthak Gupta, Abhay Kumar, Harsh Dugar | Published: 2023-06-27 Malware ClassificationMalware Detection MethodRobustness Evaluation 2023.06.27 2025.05.28 Literature Database
PWSHAP: A Path-Wise Explanation Model for Targeted Variables Authors: Lucile Ter-Minassian, Oscar Clivio, Karla Diaz-Ordaz, Robin J. Evans, Chris Holmes | Published: 2023-06-26 Robustness EvaluationCausal InterpretationLocal Mediation Effect 2023.06.26 2025.05.28 Literature Database
Theoretical Foundations of Adversarially Robust Learning Authors: Omar Montasser | Published: 2023-06-13 PoisoningRobustness EvaluationAdversarial Example 2023.06.13 2025.05.28 Literature Database
A Closer Look at the Adversarial Robustness of Deep Equilibrium Models Authors: Zonghan Yang, Tianyu Pang, Yang Liu | Published: 2023-06-02 Robustness EvaluationAdversarial attackAdaptive Adversarial Training 2023.06.02 2025.05.28 Literature Database