Investigating Security Implications of Automatically Generated Code on the Software Supply Chain Authors: Xiaofan Li, Xing Gao | Published: 2025-09-24 AlignmentIndirect Prompt InjectionVulnerability Research 2025.09.24 2025.09.26 Literature Database
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Authors: Shova Kuikel, Aritran Piplai, Palvi Aggarwal | Published: 2025-06-16 AlignmentPrompt InjectionLarge Language Model 2025.06.16 2025.06.18 Literature Database
Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis Authors: Avihay Cohen | Published: 2025-06-04 AlignmentPrompt InjectionDynamic Analysis 2025.06.04 2025.06.06 Literature Database
MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment Authors: John Halloran | Published: 2025-05-29 Poisoning attack on RAGAlignment料理材料 2025.05.29 2025.05.31 Literature Database
Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion Authors: Chunlong Xie, Jialing He, Shangwei Guo, Jiacheng Wang, Shudong Zhang, Tianwei Zhang, Tao Xiang | Published: 2025-05-29 Alignment敵対的オブジェクト生成Optimization Methods 2025.05.29 2025.05.31 Literature Database
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization Authors: Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Meng Sun | Published: 2025-05-22 LLM SecurityAlignmentAdversarial Learning 2025.05.22 2025.05.28 Literature Database
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning Authors: Biao Yi, Tiansheng Huang, Baolei Zhang, Tong Li, Lihai Nie, Zheli Liu, Li Shen | Published: 2025-05-22 AlignmentIndirect Prompt InjectionCalculation of Output Harmfulness 2025.05.22 2025.05.28 Literature Database
ReCopilot: Reverse Engineering Copilot in Binary Analysis Authors: Guoqiang Chen, Huiqi Sun, Daguang Liu, Zhiqi Wang, Qiang Wang, Bin Yin, Lu Liu, Lingyun Ying | Published: 2025-05-22 Alignmentバイナリ分析Dynamic Analysis 2025.05.22 2025.05.28 Literature Database
Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye | Published: 2025-05-21 AlignmentPrompt InjectionDefense Mechanism 2025.05.21 2025.05.28 Literature Database
sudoLLM : On Multi-role Alignment of Language Models Authors: Soumadeep Saha, Akshay Chaturvedi, Joy Mahapatra, Utpal Garain | Published: 2025-05-20 AlignmentPrompt InjectionLarge Language Model 2025.05.20 2025.05.28 Literature Database