Efficient Preference Poisoning Attack on Offline RLHF Authors: Chenye Yang, Weiyu Xu, Lifeng Lai | Published: 2026-05-04 2026.05.04 文献データベース
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption Authors: Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia | Published: 2026-04-30 2026.04.30 文献データベース
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning Authors: Bowen Sun, Chaozhuo Li, Yaodong Yang, Yiwei Wang, Chaowei Xiao | Published: 2026-04-30 2026.04.30 文献データベース
MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks Authors: Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin, Stjepan Picek | Published: 2026-04-30 2026.04.30 文献データベース
VOW: Verifiable and Oblivious Watermark Detection for Large Language Models Authors: Xiaokun Luan, Yihao Zhang, Pengcheng Su, Feiran Lei, Meng Sun | Published: 2026-04-30 2026.04.30 文献データベース
Low Rank Adaptation for Adversarial Perturbation Authors: Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang, Ning Zhang | Published: 2026-04-30 2026.04.30 文献データベース
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study Authors: Luyao Xu, Xiang Chen | Published: 2026-04-30 2026.04.30 文献データベース
AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning Authors: Zehui Tang, Yuchen Liu, Feihu Huang | Published: 2026-04-30 2026.04.30 文献データベース
Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations Authors: Md Hasan Saju, Akramul Azim | Published: 2026-04-30 2026.04.30 文献データベース
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version) Authors: Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin | Published: 2026-04-30 2026.04.30 文献データベース