倫理基準遵守

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

Authors: Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang | Published: 2025-10-30

Prompt Injection

Impact of Generalization

倫理基準遵守

2025.10.30 2025.11.01

Literature Database

Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions

Authors: Qinnan Hu, Yuntao Wang, Yuan Gao, Zhou Su, Linkang Du | Published: 2025-09-11

Relationship of AI Systems

倫理基準遵守

Anomaly Detection Method

2025.09.11 2025.09.13

Literature Database

Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs

Authors: Yu Yan, Sheng Sun, Zhe Wang, Yijun Lin, Zenghao Duan, zhifei zheng, Min Liu, Zhiyi yin, Jianping Zhang | Published: 2025-08-22 | Updated: 2025-09-15

Privacy Assessment

倫理基準遵守

Large Language Model

2025.08.22 2025.09.17

Literature Database

Rethinking Exact Unlearning under Exposure: Extracting Forgotten Data under Exact Unlearning in Large Language Model

Authors: Xiaoyu Wu, Yifei Pang, Terrance Liu, Zhiwei Steven Wu | Published: 2025-05-30 | Updated: 2025-10-06

Privacy-Preserving Machine Learning

Privacy Loss Analysis

倫理基準遵守

2025.05.30 2025.10.08

Literature Database

Adversarial Suffix Filtering: a Defense Pipeline for LLMs

Authors: David Khachaturov, Robert Mullins | Published: 2025-05-14

Prompt validation

倫理基準遵守

Attack Detection Method

2025.05.14 2025.05.28

Literature Database