These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The use of Natural Language Processing (NLP) in highstakes AI-based
applications has increased significantly in recent years, especially since the
emergence of Large Language Models (LLMs). However, despite their strong
performance, LLMs introduce important legal/ ethical concerns, particularly
regarding privacy, data protection, and transparency. Due to these concerns,
this work explores the use of Named- Entity Recognition (NER) to facilitate the
privacy-preserving training (or adaptation) of LLMs. We propose a framework
that uses NER technologies to anonymize sensitive information in text data,
such as personal identities or geographic locations. An evaluation of the
proposed privacy-preserving learning framework was conducted to measure its
impact on user privacy and system performance in a particular high-stakes and
sensitive setup: AI-based resume scoring for recruitment processes. The study
involved two language models (BERT and RoBERTa) and six anonymization
algorithms (based on Presidio, FLAIR, BERT, and different versions of GPT)
applied to a database of 24,000 candidate profiles. The findings indicate that
the proposed privacy preservation techniques effectively maintain system
performance while playing a critical role in safeguarding candidate
confidentiality, thus promoting trust in the experimented scenario. On top of
the proposed privacy-preserving approach, we also experiment applying an
existing approach that reduces the gender bias in LLMs, thus finally obtaining
our proposed Privacyand Bias-aware LLMs (PBa-LLMs). Note that the proposed
PBa-LLMs have been evaluated in a particular setup (resume scoring), but are
generally applicable to any other LLM-based AI application.