文献データベース

Variational Bayesian Bow tie Neural Networks with Shrinkage

Authors: Alisa Sheinkman, Sara Wade | Published: 2024-11-17 | Updated: 2024-11-19
スパースモデル
最適化問題
評価手法

JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit

Authors: Zeqing He, Zhibo Wang, Zhixuan Chu, Huiyu Xu, Wenhui Zhang, Qinglong Wang, Rui Zheng | Published: 2024-11-17 | Updated: 2025-04-24
ジャイルブレイク攻撃に関する具体的な言及があり、関連性が高いため
プロンプトインジェクション
大規模言語モデル

Combining Machine Learning Defenses without Conflicts

Authors: Vasisht Duddu, Rui Zhang, N. Asokan | Published: 2024-11-14 | Updated: 2025-08-14
モデルの頑健性保証
透かし評価
防御手法の統合

TinyML NLP Scheme for Semantic Wireless Sentiment Classification with Privacy Preservation

Authors: Ahmed Y. Radwan, Mohammad Shehab, Mohamed-Slim Alouini | Published: 2024-11-09 | Updated: 2025-04-03
エネルギーベースモデル
プライバシー保護
通信モデル

Unmasking the Shadows: Pinpoint the Implementations of Anti-Dynamic Analysis Techniques in Malware Using LLM

Authors: Haizhou Wang, Nanqing Luo, Xusheng Li, Peng LIu | Published: 2024-11-08 | Updated: 2025-04-29
マルウェア進化
攻撃手法
検出手法の分析

Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods

Authors: Joseph Pollock, Igor Shilov, Euodia Dodd, Yves-Alexandre de Montjoye | Published: 2024-11-08 | Updated: 2025-06-12
パフォーマンス評価
メンバーシップ推論
差分プライバシー

Post-Hoc Robustness Enhancement in Graph Neural Networks with Conditional Random Fields

Authors: Yassine Abbahaddou, Sofiane Ennadir, Johannes F. Lutzeyer, Fragkiskos D. Malliaros, Michalis Vazirgiannis | Published: 2024-11-08
実験的検証

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Authors: Fengxiang Wang, Ranjie Duan, Peng Xiao, Xiaojun Jia, Shiji Zhao, Cheng Wei, YueFeng Chen, Chongwen Wang, Jialing Tao, Hang Su, Jun Zhu, Hui Xue | Published: 2024-11-06 | Updated: 2025-01-07
プロンプトインジェクション
マルチラウンド対話

Optimal Defenses Against Gradient Reconstruction Attacks

Authors: Yuxiao Chen, Gamze Gürsoy, Qi Lei | Published: 2024-11-06
ポイズニング
防御手法

FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses

Authors: Isaac Baglin, Xiatian Zhu, Simon Hadfield | Published: 2024-11-05 | Updated: 2025-01-05
ポイズニング
攻撃の評価
評価手法