文献データベース

Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks

Authors: Hetvi Waghela, Jaydip Sen, Sneha Rakshit | Published: 2024-08-20
ポイズニング
敵対的サンプル
防御手法

LeCov: Multi-level Testing Criteria for Large Language Models

Authors: Xuan Xie, Jiayang Song, Yuheng Huang, Da Song, Fuyuan Zhang, Felix Juefei-Xu, Lei Ma | Published: 2024-08-20
LLM性能評価
テスト優先順位付け
プロンプトインジェクション

Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions

Authors: Jinxin Liu, Zao Yang | Published: 2024-08-20 | Updated: 2024-09-05
LLM性能評価
プライバシー保護手法
評価手法

Privacy Technologies for Financial Intelligence

Authors: Yang Li, Thilina Ranbaduge, Kee Siong Ng | Published: 2024-08-19
プライバシー保護
プライバシー保護手法
金融インテリジェンス

Transferring Backdoors between Large Language Models by Knowledge Distillation

Authors: Pengzhou Cheng, Zongru Wu, Tianjie Ju, Wei Du, Zhuosheng Zhang Gongshen Liu | Published: 2024-08-19
LLMセキュリティ
バックドア攻撃
ポイズニング

Regularization for Adversarial Robust Learning

Authors: Jie Wang, Rui Gao, Yao Xie | Published: 2024-08-19 | Updated: 2024-08-22
アルゴリズム
ポイズニング
正則化

Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning

Authors: Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi, Josh Kimball, Ling Liu | Published: 2024-08-18 | Updated: 2024-09-03
LLMセキュリティ
プロンプトインジェクション
安全性アライメント

Security Concerns in Quantum Machine Learning as a Service

Authors: Satwik Kundu, Swaroop Ghosh | Published: 2024-08-18
サイバーセキュリティ
データの隠蔽
量子フレームワーク

Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training

Authors: Huitong Jin, Yipeng Zhou, Laizhong Cui, Quan Z. Sheng | Published: 2024-08-18
ウォーターマーキング
プライバシー保護手法
モデル性能評価

BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger

Authors: Yulin Chen, Haoran Li, Yirui Zhang, Zihao Zheng, Yangqiu Song, Bryan Hooi | Published: 2024-08-17 | Updated: 2025-01-10
AIコンプライアンス
LLMセキュリティ
コンテンツモデレーション