Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Authors: Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion | Published: 2024-04-02 | Updated: 2024-10-07 2024.04.02 2025.04.03 文献データベース
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods Authors: Alberto Blanco-Justicia, Najeeb Jebreel, Benet Manzanares, David Sánchez, Josep Domingo-Ferrer, Guillem Collell, Kuan Eeik Tan | Published: 2024-04-02 2024.04.02 2025.04.03 文献データベース
Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack Authors: Ying Zhou, Ben He, Le Sun | Published: 2024-04-02 2024.04.02 2025.04.03 文献データベース
AAA: an Adaptive Mechanism for Locally Differential Private Mean Estimation Authors: Fei Wei, Ergute Bao, Xiaokui Xiao, Yin Yang, Bolin Ding | Published: 2024-04-02 | Updated: 2024-04-03 2024.04.02 2025.04.03 文献データベース
Can Biases in ImageNet Models Explain Generalization? Authors: Paul Gavrikov, Janis Keuper | Published: 2024-04-01 2024.04.01 2025.04.03 文献データベース
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models Authors: Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini | Published: 2024-04-01 2024.04.01 2025.04.03 文献データベース
Machine Unlearning for Traditional Models and Large Language Models: A Short Survey Authors: Yi Xu | Published: 2024-04-01 2024.04.01 2025.04.03 文献データベース
Enhancing Reasoning Capacity of SLM using Cognitive Enhancement Authors: Jonathan Pan, Swee Liang Wong, Xin Wei Chia, Yidi Yuan | Published: 2024-04-01 2024.04.01 2025.04.03 文献データベース
An incremental hybrid adaptive network-based IDS in Software Defined Networks to detect stealth attacks Authors: Abdullah H Alqahtani | Published: 2024-04-01 2024.04.01 2025.04.03 文献データベース
What is in Your Safe Data? Identifying Benign Data that Breaks Safety Authors: Luxi He, Mengzhou Xia, Peter Henderson | Published: 2024-04-01 | Updated: 2024-08-20 2024.04.01 2025.04.03 文献データベース