アライメント

ReCopilot: Reverse Engineering Copilot in Binary Analysis

Authors: Guoqiang Chen, Huiqi Sun, Daguang Liu, Zhiqi Wang, Qiang Wang, Bin Yin, Lu Liu, Lingyun Ying | Published: 2025-05-22
アライメント
バイナリ分析
動的分析

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye | Published: 2025-05-21
アライメント
プロンプトインジェクション
防御メカニズム

sudoLLM : On Multi-role Alignment of Language Models

Authors: Soumadeep Saha, Akshay Chaturvedi, Joy Mahapatra, Utpal Garain | Published: 2025-05-20
アライメント
プロンプトインジェクション
大規模言語モデル

LlamaFirewall: An open source guardrail system for building secure AI agents

Authors: Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, Joshua Saxe | Published: 2025-05-06
LLMセキュリティ
アライメント
プロンプトインジェクション

Bridging Expertise Gaps: The Role of LLMs in Human-AI Collaboration for Cybersecurity

Authors: Shahroz Tariq, Ronal Singh, Mohan Baruwal Chhetri, Surya Nepal, Cecile Paris | Published: 2025-05-06
LLMとの協力効果
アライメント
参加者の質問分析

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Yi Ding, Donghai Hong, Jiaming Ji, Xinfeng Li, Yifan Jiang, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Yanwei Yue, Wenke Huang, Guancheng Wan, Tianlin Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Tianwei Zhang, Xingjun Ma, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Yuval Elovici, Bhavya Kailkhura, Bo Li, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, Xiaofeng Wang, Shuicheng Yan, Dacheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu | Published: 2025-04-22
アライメント
データ生成の安全性
プロンプトインジェクション

aiXamine: LLM Safety and Security Simplified

Authors: Fatih Deniz, Dorde Popovic, Yazan Boshmaf, Euisuh Jeong, Minhaj Ahmad, Sanjay Chawla, Issa Khalil | Published: 2025-04-21
LLM性能評価
アライメント
パフォーマンス評価

GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms

Authors: Sinan He, An Wang | Published: 2025-04-17
アライメント
プロンプトインジェクション
脆弱性研究

Personalized Attacks of Social Engineering in Multi-turn Conversations — LLM Agents for Simulation and Detection

Authors: Tharindu Kumarage, Cameron Johnson, Jadie Adams, Lin Ai, Matthias Kirchner, Anthony Hoogs, Joshua Garland, Julia Hirschberg, Arslan Basharat, Huan Liu | Published: 2025-03-18
アライメント
ソーシャルエンジニアリング攻撃
攻撃手法

SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings

Authors: Weikai Lu, Hao Peng, Huiping Zhuang, Cen Chen, Ziqian Zeng | Published: 2025-02-18 | Updated: 2025-05-21
アライメント
テキスト生成手法
プロンプトインジェクション