AIセキュリティポータルbot

Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis

Authors: Jeonghwan Park, Niall McLaughlin, Ihsen Alouani | Published: 2025-03-04 | Updated: 2025-03-16
攻撃手法
敵対的サンプルの検知
深層学習

Privacy-preserving Machine Learning in Internet of Vehicle Applications: Fundamentals, Recent Advances, and Future Direction

Authors: Nazmul Islam, Mohammad Zulkernine | Published: 2025-03-03 | Updated: 2025-07-08
プライバシーリスク管理
交通シミュレーション
連合学習

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions

Authors: Wang YuHang, Junkang Guo, Aolei Liu, Kaihao Wang, Zaitong Wu, Zhenyu Liu, Wenfei Yin, Jian Liu | Published: 2025-03-02 | Updated: 2025-03-21
ロバスト性
敵対的学習
敵対的訓練

Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems

Authors: Song Xia, Yi Yu, Wenhan Yang, Meiwen Ding, Zhuo Chen, Ling-Yu Duan, Alex C. Kot, Xudong Jiang | Published: 2025-03-01 | Updated: 2025-04-03
プライバシー保護
モデルの頑健性保証
モデル性能評価

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

Authors: Hanjiang Hu, Alexander Robey, Changliu Liu | Published: 2025-02-28 | Updated: 2025-08-25
バックドア攻撃
プロンプトインジェクション
透かし

Cyber Defense Reinvented: Large Language Models as Threat Intelligence Copilots

Authors: Xiaoqun Liu, Jiacheng Liang, Qiben Yan, Jiyong Jang, Sicheng Mao, Muchao Ye, Jinyuan Jia, Zhaohan Xi | Published: 2025-02-28 | Updated: 2025-04-16
サイバー脅威インテリジェンス
プロンプトリーキング
モデル抽出攻撃

Models That Are Interpretable But Not Transparent

Authors: Chudi Zhong, Panyu Chen, Cynthia Rudin | Published: 2025-02-26
モデル情報を秘匿しつつ、説明性を提供する手法
情報セキュリティ
攻撃者の行動分析

Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs

Authors: Shiyu Xiang, Ansen Zhang, Yanfei Cao, Yang Fan, Ronghao Chen | Published: 2025-02-26 | Updated: 2025-05-28
LLMセキュリティ
プロンプトインジェクション
攻撃の評価

Evaluating Membership Inference Attacks in heterogeneous-data setups

Authors: Bram van Dartel, Marc Damie, Florian Hahn | Published: 2025-02-26 | Updated: 2025-04-28
データセット生成
プライバシー保護
攻撃タイプ

Detecting Benchmark Contamination Through Watermarking

Authors: Tom Sander, Pierre Fernandez, Saeed Mahloujifar, Alain Durmus, Chuan Guo | Published: 2025-02-24 | Updated: 2025-07-21
ウォーターマーキング
データ汚染検出
性能評価