Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization Authors: Furkan Mumcu, Yasin Yilmaz | Published: 2026-03-04 AlignmentRobust OptimizationOptimization Methods 2026.03.04 2026.03.06 Literature Database
A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality Authors: Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan | Published: 2026-03-04 LLM Performance EvaluationAlignmentevaluation metrics 2026.03.04 2026.03.06 Literature Database
Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution Authors: Guoxin Shi, Haoyu Wang, Zaihui Yang, Yuxing Wang, Yongzhe Chang | Published: 2026-03-02 Alignment安全性評価機械学習応用 2026.03.02 2026.03.04 Literature Database
Layer-Targeted Multilingual Knowledge Erasure in Large Language Models Authors: Taoran Li, Varun Chandrasekaran, Zhiyuan Yu | Published: 2026-02-26 AlignmentMachine learningMachine Learning Method 2026.02.26 2026.02.28 Literature Database
From Defender to Devil? Unintended Risk Interactions Induced by LLM Defenses Authors: Xiangtao Meng, Tianshuo Cong, Li Wang, Wenyu Chen, Zheng Li, Shanqing Guo, Xiaoyun Wang | Published: 2025-10-09 AlignmentIndirect Prompt InjectionDefense Effectiveness Analysis 2025.10.09 2025.10.11 Literature Database
Investigating Security Implications of Automatically Generated Code on the Software Supply Chain Authors: Xiaofan Li, Xing Gao | Published: 2025-09-24 AlignmentIndirect Prompt InjectionVulnerability Research 2025.09.24 2025.09.26 Literature Database
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Authors: Shova Kuikel, Aritran Piplai, Palvi Aggarwal | Published: 2025-06-16 AlignmentPrompt InjectionLarge Language Model 2025.06.16 2025.06.18 Literature Database
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety Authors: Taegyeong Lee, Jeonghwa Yoo, Hyoungseo Cho, Soo Yong Kim, Yunho Maeng | Published: 2025-06-14 | Updated: 2025-09-30 AlignmentEthical StatementMalicious Prompt 2025.06.14 2025.10.02 Literature Database
The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs Authors: Songyang Liu, Chaozhuo Li, Jiameng Qiu, Xi Zhang, Feiran Huang, Litian Zhang, Yiming Hei, Philip S. Yu | Published: 2025-06-06 | Updated: 2025-10-30 AlignmentLarge Language Model安全性評価 2025.06.06 2025.11.01 Literature Database
Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis Authors: Avihay Cohen | Published: 2025-06-04 AlignmentPrompt InjectionDynamic Analysis 2025.06.04 2025.06.06 Literature Database