FedTruth: Byzantine-Robust and Backdoor-Resilient Federated Learning Framework Authors: Sheldon C. Ebron Jr., Kan Yang | Published: 2023-11-17 2023.11.17 2025.05.28 Literature Database
You Cannot Escape Me: Detecting Evasions of SIEM Rules in Enterprise Networks Authors: Rafael Uetz, Marco Herzog, Louis Hackländer, Simon Schwarz, Martin Henze | Published: 2023-11-16 | Updated: 2023-12-19 2023.11.16 2025.05.28 Literature Database
Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring Authors: Yuhang Li, Yihan Wang, Zhouxing Shi, Cho-Jui Hsieh | Published: 2023-11-16 2023.11.16 2025.05.28 Literature Database
Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework Authors: Matthew Pisano, Peter Ly, Abraham Sanders, Bingsheng Yao, Dakuo Wang, Tomek Strzalkowski, Mei Si | Published: 2023-11-16 | Updated: 2024-08-18 2023.11.16 2025.05.28 Literature Database
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections Authors: Yuanpu Cao, Bochuan Cao, Jinghui Chen | Published: 2023-11-15 | Updated: 2024-06-09 2023.11.15 2025.05.28 Literature Database
HAL 9000: Skynet’s Risk Manager Authors: Tadeu Freitas, Mário Neto, Inês Dutra, João Soares, Manuel Correia, Rolando Martins | Published: 2023-11-15 2023.11.15 2025.05.28 Literature Database
Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment Authors: Haoran Wang, Kai Shu | Published: 2023-11-15 | Updated: 2024-08-15 2023.11.15 2025.05.28 Literature Database
Are Normalizing Flows the Key to Unlocking the Exponential Mechanism? Authors: Robert A. Bridges, Vandy J. Tombs, Christopher B. Stanley | Published: 2023-11-15 | Updated: 2024-06-11 2023.11.15 2025.05.28 Literature Database
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts Authors: Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao Sun | Published: 2023-11-15 | Updated: 2024-01-20 2023.11.15 2025.05.28 Literature Database
A Robust Semantics-based Watermark for Large Language Model against Paraphrasing Authors: Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, Jiliang Tang | Published: 2023-11-15 | Updated: 2024-04-01 2023.11.15 2025.05.28 Literature Database