ロバスト性に関する評価

Fail-Closed Alignment for Large Language Models

Authors: Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong | Published: 2026-02-19

プロンプトインジェクション

ロバスト性に関する評価

防御手法

2026.02.19

文献データベース

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

Authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan | Published: 2023-11-20 | Updated: 2024-02-18

プロンプトインジェクション

プロンプトの検証

ロバスト性に関する評価

2023.11.20 2025.04.03

文献データベース

Instability of computer vision models is a necessary result of the task itself

Authors: Oliver Turnbull, George Cevora | Published: 2023-10-26

ロバスト性に関する評価

敵対的サンプル

次元削減手法

2023.10.26 2025.04.03

文献データベース

Attesting Distributional Properties of Training Data for Machine Learning

Authors: Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas Schneider, N. Asokan | Published: 2023-08-18 | Updated: 2024-04-09

セキュリティ保証

モデル性能評価

ロバスト性に関する評価

2023.08.18 2025.04.03

文献データベース

Robustness Over Time: Understanding Adversarial Examples’ Effectiveness on Longitudinal Versions of Large Language Models

Authors: Yugeng Liu, Tianshuo Cong, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang | Published: 2023-08-15 | Updated: 2024-05-06

プロンプトインジェクション

モデル性能評価

ロバスト性に関する評価

2023.08.15 2025.04.03

文献データベース

Robust Ranking Explanations

Authors: Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie | Published: 2023-07-08

ロバスト性に関する評価

脅威モデリング

説明可能性

2023.07.08 2025.04.03

文献データベース

[Re] Double Sampling Randomized Smoothing

Authors: Aryan Gupta, Sarthak Gupta, Abhay Kumar, Harsh Dugar | Published: 2023-06-27

マルウェア分類

マルウェア検出手法

ロバスト性に関する評価

2023.06.27 2025.04.03

文献データベース

PWSHAP: A Path-Wise Explanation Model for Targeted Variables

Authors: Lucile Ter-Minassian, Oscar Clivio, Karla Diaz-Ordaz, Robin J. Evans, Chris Holmes | Published: 2023-06-26

ロバスト性に関する評価

因果解釈

局所的媒介効果

2023.06.26 2025.04.03

文献データベース

Theoretical Foundations of Adversarially Robust Learning

Authors: Omar Montasser | Published: 2023-06-13

ポイズニング

ロバスト性に関する評価

敵対的サンプル

2023.06.13 2025.04.03

文献データベース

A Closer Look at the Adversarial Robustness of Deep Equilibrium Models

Authors: Zonghan Yang, Tianyu Pang, Yang Liu | Published: 2023-06-02

ロバスト性に関する評価

敵対的攻撃

適応型敵対的訓練

2023.06.02 2025.04.03

文献データベース