文献データベース

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

Authors: Yuanpu Cao, Bochuan Cao, Jinghui Chen | Published: 2023-11-15 | Updated: 2024-06-09

バックドア攻撃

プロンプトインジェクション

2023.11.15 2025.04.03

文献データベース

HAL 9000: Skynet’s Risk Manager

Authors: Tadeu Freitas, Mário Neto, Inês Dutra, João Soares, Manuel Correia, Rolando Martins | Published: 2023-11-15

ソフトウェアセキュリティ

機械学習手法

脆弱性管理

2023.11.15 2025.04.03

文献データベース

Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment

Authors: Haoran Wang, Kai Shu | Published: 2023-11-15 | Updated: 2024-08-15

プロンプトインジェクション

攻撃手法

自然言語処理

2023.11.15 2025.04.03

文献データベース

Are Normalizing Flows the Key to Unlocking the Exponential Mechanism?

Authors: Robert A. Bridges, Vandy J. Tombs, Christopher B. Stanley | Published: 2023-11-15 | Updated: 2024-06-11

プライバシー保護

収束特性

機械学習手法

2023.11.15 2025.04.03

文献データベース

Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts

Authors: Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao Sun | Published: 2023-11-15 | Updated: 2024-01-20

プロンプトインジェクション

攻撃手法

顔認識

2023.11.15 2025.04.03

文献データベース

A Robust Semantics-based Watermark for Large Language Model against Paraphrasing

Authors: Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, Jiliang Tang | Published: 2023-11-15 | Updated: 2024-04-01

プロンプトインジェクション

ロバスト性評価

情報隠蔽手法

2023.11.15 2025.04.03

文献データベース

KnowSafe: Combined Knowledge and Data Driven Hazard Mitigation in Artificial Pancreas Systems

Authors: Xugui Zhou, Maxfield Kouzel, Chloe Smith, Homa Alemzadeh | Published: 2023-11-13

CPSの制御モデル

制御アクション生成

危険予測と緩和

2023.11.13 2025.04.03

文献データベース

Adversarial Purification for Data-Driven Power System Event Classifiers with Diffusion Models

Authors: Yuanbin Cheng, Koji Yamashita, Jim Follum, Nanpeng Yu | Published: 2023-11-13

敵対的テキスト浄化

最適化問題

防御手法

2023.11.13 2025.04.03

文献データベース

Seeing is Believing: A Federated Learning Based Prototype to Detect Wireless Injection Attacks

Authors: Aadil Hussain, Nitheesh Gundapu, Sarang Drugkar, Suraj Kiran, J. Harshan, Ranjitha Prasad | Published: 2023-11-11

学習の改善

深層学習手法

防御手法

2023.11.11 2025.04.03

文献データベース

Does Differential Privacy Prevent Backdoor Attacks in Practice?

Authors: Fereshteh Razmi, Jian Lou, Li Xiong | Published: 2023-11-10

データプライバシー評価

トレードオフ分析

防御手法

2023.11.10 2025.04.03

文献データベース