文献データベース

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

Authors: J Rosser, Jakob Nicolaus Foerster | Published: 2025-02-02 | Updated: 2025-04-14
LLM性能評価
マルチオブジェクティブ最適化
安全性アライメント

Safety at Scale: A Comprehensive Survey of Large Model Safety

Authors: Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, Hanxun Huang, Yige Li, Jiaming Zhang, Xiang Zheng, Yang Bai, Zuxuan Wu, Xipeng Qiu, Jingfeng Zhang, Yiming Li, Xudong Han, Haonan Li, Jun Sun, Cong Wang, Jindong Gu, Baoyuan Wu, Siheng Chen, Tianwei Zhang, Yang Liu, Mingming Gong, Tongliang Liu, Shirui Pan, Cihang Xie, Tianyu Pang, Yinpeng Dong, Ruoxi Jia, Yang Zhang, Shiqing Ma, Xiangyu Zhang, Neil Gong, Chaowei Xiao, Sarah Erfani, Tim Baldwin, Bo Li, Masashi Sugiyama, Dacheng Tao, James Bailey, Yu-Gang Jiang | Published: 2025-02-02 | Updated: 2025-03-19
インダイレクトプロンプトインジェクション
プロンプトインジェクション
攻撃手法

LLM Safety Alignment is Divergence Estimation in Disguise

Authors: Rajdeep Haldar, Ziyi Wang, Qifan Song, Guang Lin, Yue Xing | Published: 2025-02-02
プロンプトインジェクション
収束分析
大規模言語モデル
安全性アライメント

Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation

Authors: Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, Amir Houmansadr | Published: 2025-02-01 | Updated: 2025-06-30
RAG
プロンプトリーキング
メンバーシップ推論

Byzantine-Resilient Zero-Order Optimization for Communication-Efficient Heterogeneous Federated Learning

Authors: Maximilian Egger, Mayank Bakshi, Rawad Bitar | Published: 2025-01-31
収束保証
収束分析
通信効率

BounTCHA: A CAPTCHA Utilizing Boundary Identification in Guided Generative AI-extended Videos

Authors: Lehao Lin, Ke Wang, Maha Abdallah, Wei Cai | Published: 2025-01-30 | Updated: 2025-04-01
CAPTCHA
動画信頼性確保
敵対的サンプルの脆弱性

Smoothed Embeddings for Robust Language Models

Authors: Ryo Hase, Md Rafi Ur Rashid, Ashley Lewis, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang | Published: 2025-01-27
プロンプトインジェクション
メンバーシップ推論
敵対的訓練

Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges

Authors: Emad Efatinasab, Alessandro Brighente, Denis Donadel, Mauro Conti, Mirco Rampazzo | Published: 2025-01-27 | Updated: 2025-06-24
エネルギー管理
モデル抽出攻撃
敵対的学習

Improving Network Threat Detection by Knowledge Graph, Large Language Model, and Imbalanced Learning

Authors: Lili Zhang, Quanyan Zhu, Herman Ray, Ying Xie | Published: 2025-01-26
ネットワーク脅威検出
ユーザー活動解析
学習の改善

I Know What You Did Last Summer: Identifying VR User Activity Through VR Network Traffic

Authors: Sheikh Samit Muhaimin, Spyridon Mastorakis | Published: 2025-01-25 | Updated: 2025-05-05
アプリ分類手法
ユーザー行動の変化
機械学習技術