Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes

TOP Literature Database Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2508.12622

PDF

https://arxiv.org/pdf/2508.12622

Paper Information

Author: Zilong Lin,Zichuan Li,Xiaojing Liao,XiaoFeng Wang
Published: 8-18-2025
Affiliation: University of Missouri–Kansas City
Country: United States of America
Conference

Labels Estimated by AI

Calculation of Output Harmfulness Data Generation Method Disabling Safety Mechanisms of LLM

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The advancement of AI technologies, particularly Large Language Models (LLMs), has transformed computing while introducing new security and privacy risks. Prior research shows that cybercriminals are increasingly leveraging uncensored LLMs (ULLMs) as backends for malicious services. Understanding these ULLMs has been hindered by the challenge of identifying them among the vast number of open-source LLMs hosted on platforms like Hugging Face. In this paper, we present the first systematic study of ULLMs, overcoming this challenge by modeling relationships among open-source LLMs and between them and related data, such as fine-tuning, merging, compressing models, and using or generating datasets with harmful content. Representing these connections as a knowledge graph, we applied graph-based deep learning to discover over 11,000 ULLMs from a small set of labeled examples and uncensored datasets. A closer analysis of these ULLMs reveals their alarming scale and usage. Some have been downloaded over a million times, with one over 19 million installs. These models -- created through fine-tuning, merging, or compression of other models -- are capable of generating harmful content, including hate speech, violence, erotic material, and malicious code. Evidence shows their integration into hundreds of malicious applications offering services like erotic role-play, child pornography, malicious code generation, and more. In addition, underground forums reveal criminals sharing techniques and scripts to build cheap alternatives to commercial malicious LLMs. These findings highlight the widespread abuse of LLM technology and the urgent need for effective countermeasures against this growing threat.

External Datasets

toxic-dpo-v0.1

Capybara

Pure-Dove

LessWrong-Amplify-Instruct

CatQA

DPO_Pairs-Roleplay-Alpaca-NSFW

racist-dataset