Literature Database

Secure and Efficient Access Control for Computer-Use Agents via Context Space

Authors: Haochen Gong, Chenxiao Li, Rui Chang, Wenbo Shen | Published: 2025-09-26 | Updated: 2025-10-21

Indirect Prompt Injection

エージェント設計

Security Metric

2025.09.26 2025.10.23

Literature Database

Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment

Authors: Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son | Published: 2025-09-26 | Updated: 2025-10-09

Bias Detection in AI Output

Robustness

Defense Mechanism

2025.09.26 2025.10.11

Literature Database

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Authors: Miao Yu, Zhenhong Zhou, Moayad Aloqaily, Kun Wang, Biwei Huang, Stephen Wang, Yueming Jin, Qingsong Wen | Published: 2025-09-26 | Updated: 2025-09-30

Disabling Safety Mechanisms of LLM

Self-Attention Mechanism

Interpretability

2025.09.26 2025.10.02

Literature Database

It’s not Easy: Applying Supervised Machine Learning to Detect Malicious Extensions in the Chrome Web Store

Authors: Ben Rosenzweig, Valentino Dalla Valle, Giovanni Apruzzese, Aurore Fass | Published: 2025-09-25 | Updated: 2025-10-02

Program Analysis

User Activity Analysis

Malicious Package Detection

2025.09.25 2025.10.04

Literature Database

No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks

Authors: Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum, Itay Safran | Published: 2025-09-25

Training Data Generation

Privacy Protection Mechanism

Privacy Protection Method

2025.09.25 2025.09.27

Literature Database

EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense

Authors: Wei Huang, De-Tian Chu, Lin-Yuan Bai, Wei Kang, Hai-Tao Zhang, Bo Li, Zhi-Mo Han, Jing Ge, Hai-Feng Lin | Published: 2025-09-25

Phishing Attack

Large Language Model

Self-Evolving Framework

2025.09.25 2025.09.27

Literature Database

PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints

Authors: Jiahao Huo, Shuliang Liu, Bin Wang, Junyan Zhang, Yibo Yan, Aiwei Liu, Xuming Hu, Mingxun Zhou | Published: 2025-09-25

Algorithm

Digital Watermarking for Generative AI

Robustness of Watermarking Techniques

2025.09.25 2025.09.27

Literature Database

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

Authors: Ping He, Changjiang Li, Binbin Zhao, Tianyu Du, Shouling Ji | Published: 2025-09-25

Indirect Prompt Injection

ツール使用分析

自動生成フレームワーク

2025.09.25 2025.09.27

Literature Database

Dual-Path Phishing Detection: Integrating Transformer-Based NLP with Structural URL Analysis

Authors: Ibrahim Altan, Abdulla Bachir, Yousuf Parbhulkar, Abdul Muksith Rizvi, Moshiur Farazi | Published: 2025-09-25

フィッシング攻撃の傾向

Analysis of Detection Methods

Natural Language Processing

2025.09.25 2025.09.27

Literature Database

RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks

Authors: Hanbo Huang, Yiran Zhang, Hao Zheng, Xuan Gong, Yihan Li, Lin Liu, Shiyu Liang | Published: 2025-09-25

Disabling Safety Mechanisms of LLM

Prompt Injection

Watermark Design

2025.09.25 2025.09.27

Literature Database