文献データベース

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

Authors: Benji Peng, Keyu Chen, Qian Niu, Ziqian Bi, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence K. Q. Yan, Yizhu Wen, Yichao Zhang, Caitlyn Heqi Yin | Published: 2024-10-20 | Updated: 2025-05-08
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

A Novel Reinforcement Learning Model for Post-Incident Malware Investigations

Authors: Dipo Dunsin, Mohamed Chahine Ghanem, Karim Ouazzane, Vassil Vassilev | Published: 2024-10-19 | Updated: 2025-01-12
サイバーセキュリティ
マルウェア分類

Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment

Authors: Zedian Shao, Hongbin Liu, Jaden Mu, Neil Zhenqiang Gong | Published: 2024-10-18 | Updated: 2025-09-15
インダイレクトプロンプトインジェクション
データ汚染検出
バックドア攻撃手法

Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs

Authors: Rui Pu, Chaozhuo Li, Rui Ha, Zejian Chen, Litian Zhang, Zheng Liu, Lirong Qiu, Zaisheng Ye | Published: 2024-10-18 | Updated: 2025-07-08
LLMの安全機構の解除
プロンプトインジェクション
プロンプトの検証

Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Authors: Shuai Zhao, Xiaobao Wu, Cong-Duy Nguyen, Yanhao Jia, Meihuizi Jia, Yichao Feng, Luu Anh Tuan | Published: 2024-10-18 | Updated: 2025-05-20
バックドアモデルの検知
バックドア攻撃手法
知識蒸留

Private Counterfactual Retrieval

Authors: Mohamed Nomeir, Pasan Dissanayake, Shreya Meel, Sanghamitra Dutta, Sennur Ulukus | Published: 2024-10-17 | Updated: 2025-07-24
プライバシー保護手法
距離評価手法
透かし評価

FTSmartAudit: A Knowledge Distillation-Enhanced Framework for Automated Smart Contract Auditing Using Fine-Tuned LLMs

Authors: Zhiyuan Wei, Jing Sun, Zijian Zhang, Xianhao Zhang, Zhe Hou | Published: 2024-10-17 | Updated: 2025-11-03
AIによる出力のバイアスの検出
サイバーセキュリティの自動化
情報セキュリティ

Low-Rank Adversarial PGD Attack

Authors: Dayana Savostianova, Emanuele Zangrando, Francesco Tudisco | Published: 2024-10-16
攻撃手法

Reconstruction of Differentially Private Text Sanitization via Large Language Models

Authors: Shuchao Pang, Zhigang Lu, Haichen Wang, Peng Fu, Yongbin Zhou, Minhui Xue | Published: 2024-10-16 | Updated: 2025-09-18
プライバシー分析
プロンプトインジェクション
プロンプトリーキング

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Authors: Qinfeng Li, Tianyue Luo, Xuhong Zhang, Yangfan Xie, Zhiqiang Shen, Lijun Zhang, Yier Jin, Hao Peng, Xinkui Zhao, Xianwei Zhu, Jianwei Yin | Published: 2024-10-16 | Updated: 2025-10-16
セキュリティ分析
データ保護
モデルDoS