These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The advancement of AI technologies, particularly Large Language Models
(LLMs), has transformed computing while introducing new security and privacy
risks. Prior research shows that cybercriminals are increasingly leveraging
uncensored LLMs (ULLMs) as backends for malicious services. Understanding these
ULLMs has been hindered by the challenge of identifying them among the vast
number of open-source LLMs hosted on platforms like Hugging Face. In this
paper, we present the first systematic study of ULLMs, overcoming this
challenge by modeling relationships among open-source LLMs and between them and
related data, such as fine-tuning, merging, compressing models, and using or
generating datasets with harmful content. Representing these connections as a
knowledge graph, we applied graph-based deep learning to discover over 11,000
ULLMs from a small set of labeled examples and uncensored datasets.
A closer analysis of these ULLMs reveals their alarming scale and usage. Some
have been downloaded over a million times, with one over 19 million installs.
These models -- created through fine-tuning, merging, or compression of other
models -- are capable of generating harmful content, including hate speech,
violence, erotic material, and malicious code. Evidence shows their integration
into hundreds of malicious applications offering services like erotic
role-play, child pornography, malicious code generation, and more. In addition,
underground forums reveal criminals sharing techniques and scripts to build
cheap alternatives to commercial malicious LLMs. These findings highlight the
widespread abuse of LLM technology and the urgent need for effective
countermeasures against this growing threat.