Attack Method

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Authors: Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion | Published: 2024-04-02 | Updated: 2024-10-07

LLM Security

Prompt Injection

Attack Method

2024.04.02 2025.05.27

Literature Database

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack

Authors: Ying Zhou, Ben He, Le Sun | Published: 2024-04-02

LLM Security

Watermarking

Attack Method

2024.04.02 2025.05.27

Literature Database

Adversarial Attacks and Defenses in Fault Detection and Diagnosis: A Comprehensive Benchmark on the Tennessee Eastman Process

Authors: Vitaliy Pozdnyakov, Aleksandr Kovalenko, Ilya Makarov, Mikhail Drobyshevskiy, Kirill Lukyanov | Published: 2024-03-20 | Updated: 2024-06-07

Attack Method

Adversarial Example

Defense Method

2024.03.20 2025.05.27

Literature Database

Robustness bounds on the successful adversarial examples in probabilistic models: Implications from Gaussian processes

Authors: Hiroaki Maeshima, Akira Otsuka | Published: 2024-03-04 | Updated: 2025-03-19

Attack Method

Adversarial Example

Watermark Evaluation

2024.03.04 2025.05.27

Literature Database

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks

Authors: Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, Zhou Li | Published: 2024-03-02

LLM Security

Prompt Injection

Attack Method

2024.03.02 2025.05.27

Literature Database

Attacking Delay-based PUFs with Minimal Adversary Model

Authors: Hongming Fei, Owen Millwood, Prosanta Gope, Jack Miskelly, Biplab Sikdar | Published: 2024-03-01

Evaluation Methods for PUF

Model Performance Evaluation

Attack Method

2024.03.01 2025.05.27

Literature Database

Coercing LLMs to do and reveal (almost) anything

Authors: Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, Tom Goldstein | Published: 2024-02-21

LLM Security

Prompt Injection

Attack Method

2024.02.21 2025.05.27

Literature Database

The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Authors: Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Yu Kong, Tianlong Chen, Huan Liu | Published: 2024-02-20 | Updated: 2024-06-03

LLM Security

Classification of Malicious Actors

Attack Method

2024.02.20 2025.05.27

Literature Database

IT Intrusion Detection Using Statistical Learning and Testbed Measurements

Authors: Xiaoxuan Wang, Rolf Stadler | Published: 2024-02-20

CVE Information Extraction

Intrusion Detection System

Attack Method

2024.02.20 2025.05.27

Literature Database

Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning

Authors: Shuai Zhao, Leilei Gan, Luu Anh Tuan, Jie Fu, Lingjuan Lyu, Meihuizi Jia, Jinming Wen | Published: 2024-02-19 | Updated: 2024-03-29

Backdoor Detection

Attack Method

Defense Method

2024.02.19 2025.05.27

Literature Database