In this paper, we present a novel method for detecting fake and Large
Language Model (LLM)-generated profiles in the LinkedIn Online Social Network
immediately upon registration and before establishing connections. Early fake
profile identification is crucial to maintaining the platform's integrity since
it prevents imposters from acquiring the private and sensitive information of
legitimate users and from gaining an opportunity to increase their credibility
for future phishing and scamming activities. This work uses textual information
provided in LinkedIn profiles and introduces the Section and Subsection Tag
Embedding (SSTE) method to enhance the discriminative characteristics of these
data for distinguishing between legitimate profiles and those created by
imposters manually or by using an LLM. Additionally, the dearth of a large
publicly available LinkedIn dataset motivated us to collect 3600 LinkedIn
profiles for our research. We will release our dataset publicly for research
purposes. This is, to the best of our knowledge, the first large publicly
available LinkedIn dataset for fake LinkedIn account detection. Within our
paradigm, we assess static and contextualized word embeddings, including GloVe,
Flair, BERT, and RoBERTa. We show that the suggested method can distinguish
between legitimate and fake profiles with an accuracy of about 95% across all
word embeddings. In addition, we show that SSTE has a promising accuracy for
identifying LLM-generated profiles, despite the fact that no LLM-generated
profiles were employed during the training phase, and can achieve an accuracy
of approximately 90% when only 20 LLM-generated profiles are added to the
training set. It is a significant finding since the proliferation of several
LLMs in the near future makes it extremely challenging to design a single
system that can identify profiles created with various LLMs.
外部データセット
LinkedIn dataset of 3600 profiles including legitimate, fake, and ChatGPT-made profiles