文献データベース

Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection

Authors: Takashi Koide, Hiroki Nakano, Daiki Chiba | Published: 2026-02-05

インダイレクトプロンプトインジェクション

フィッシング検出手法

プロンプトインジェクション

2026.02.05

文献データベース

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

Authors: Zihan Wang, Hongwei Li, Rui Zhang, Wenbo Jiang, Guowen Xu | Published: 2026-02-05

LLM性能評価

データ毒性

大規模言語モデル

2026.02.05

文献データベース

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Authors: Zhenxiong Yu, Zhi Yang, Zhiheng Jin, Shuhe Wang, Heng Zhang, Yanlin Fei, Lingfeng Zeng, Fangqi Lou, Shuo Zhang, Tu Hu, Jingping Liu, Rongze Chen, Xingyu Zhu, Kunyi Wang, Chaofa Yuan, Xin Guo, Zhaowei Liu, Feipeng Zhang, Jie Huang, Huacan Wang, Ronghao Chen, Liwen Zhang | Published: 2026-02-05

セキュリティメトリック

攻撃手法の説明

毒性攻撃に特化した内容

2026.02.05

文献データベース

SynAT: Enhancing Security Knowledge Bases via Automatic Synthesizing Attack Tree from Crowd Discussions

Authors: Ziyou Jiang, Lin Shi, Guowei Yang, Xuyan Ma, Fenglong Li, Qing Wang | Published: 2026-02-05

LLM性能評価

データ生成の安全性

攻撃ツリー合成

2026.02.05

文献データベース

Hallucination-Resistant Security Planning with a Large Language Model

Authors: Kim Hammar, Tansu Alpcan, Emil Lupu | Published: 2026-02-05

LLM性能評価

ハルシネーション

ハルシネーションの検知

2026.02.05

文献データベース

Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach

Authors: Vishruti Kakkad, Paul Chung, Hanan Hibshi, Maverick Woo | Published: 2026-02-04

ポイズニング

モデル抽出攻撃

教育手法

2026.02.04

文献データベース

How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks

Authors: Yanshu Wang, Shuaishuai Yang, Jingjing He, Tong Yang | Published: 2026-02-04

LLM性能評価

プロンプトインジェクション

大規模言語モデル

2026.02.04

文献データベース

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Authors: Guang Yang, Xing Hu, Xiang Chen, Xin Xia | Published: 2026-02-04

コード生成のセキュリティ

バックドアモデルの検知

モデル抽出攻撃

2026.02.04

文献データベース

Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits

Authors: Qingwen Zhang, Wenjia Wang | Published: 2026-02-04

アルゴリズム設計

ロバスト推定

統計的手法

2026.02.04

文献データベース

文献データベース

Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

SynAT: Enhancing Security Knowledge Bases via Automatic Synthesizing Attack Tree from Crowd Discussions

Hallucination-Resistant Security Planning with a Large Language Model

Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach

How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits

Don’t believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool Descriptions