AIセキュリティポータルbot

Jailbreak Distillation: Renewable Safety Benchmarking

Authors: Jingyu Zhang, Ahmed Elgohary, Xiawei Wang, A S M Iftekhar, Ahmed Magooda, Benjamin Van Durme, Daniel Khashabi, Kyle Jackson | Published: 2025-05-28
Prompt Injection
Model Evaluation
Attack Evaluation

VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries

Authors: Nasir Hussain, Haohan Chen, Chanh Tran, Philip Huang, Zhuohao Li, Pravir Chugh, William Chen, Ashish Kundu, Yuan Tian | Published: 2025-05-28
LLM Security
Vulnerability Analysis
逆アセンブル

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

Authors: Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei | Published: 2025-05-27
Disabling Safety Mechanisms of LLM
Prompt Injection
Attack Evaluation

JavaSith: A Client-Side Framework for Analyzing Potentially Malicious Extensions in Browsers, VS Code, and NPM Packages

Authors: Avihay Cohen | Published: 2025-05-27
API Security
Client-Side Defense
Prompt Injection

Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling

Authors: Yichuan Cao, Yibo Miao, Xiao-Shan Gao, Yinpeng Dong | Published: 2025-05-27
Model Evaluation
Experimental Validation
Attack Evaluation

SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA

Authors: Jianmin Liu, Li Yan, Borui Li, Lei Yu, Chao Shen | Published: 2025-05-27
Client-Side Defense
Privacy Classification
Encryption Method

IRCopilot: Automated Incident Response with Large Language Models

Authors: Xihuan Lin, Jie Zhang, Gelei Deng, Tianzhe Liu, Xiaolong Liu, Changcai Yang, Tianwei Zhang, Qing Guo, Riqing Chen | Published: 2025-05-27
LLM Security
Indirect Prompt Injection
Model DoS

Respond to Change with Constancy: Instruction-tuning with LLM for Non-I.I.D. Network Traffic Classification

Authors: Xinjie Lin, Gang Xiong, Gaopeng Gou, Wenqi Dong, Jing Yu, Zhen Li, Wei Xia | Published: 2025-05-27
トラフィック分類
Model Performance Evaluation
Structural Learning

Engineering Trustworthy Machine-Learning Operations with Zero-Knowledge Proofs

Authors: Filippo Scaramuzza, Giovanni Quattrocchi, Damian A. Tamburri | Published: 2025-05-26
Privacy Issues
Model evaluation methods
Watermarking Technology

TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent

Authors: Dominik Meier, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp | Published: 2025-05-26
Prompt Injection
Model Extraction Attack
Watermarking Technology