These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Dementia is a sensitive neurocognitive disorder affecting tens of millions of
people worldwide and its cases are expected to triple by 2050. Alarmingly,
recent advancements in dementia classification make it possible for adversaries
to violate affected individuals' privacy and infer their sensitive condition
from speech transcriptions. Existing obfuscation methods in text have never
been applied for dementia and depend on the availability of large labeled
datasets which are challenging to collect for sensitive medical attributes. In
this work, we bridge this research gap and tackle the above issues by
leveraging Large-Language-Models (LLMs) with diverse prompt designs (zero-shot,
few-shot, and knowledge-based) to obfuscate dementia in speech transcripts. Our
evaluation shows that LLMs are more effective dementia obfuscators compared to
competing methods. However, they have billions of parameters which renders them
hard to train, store and share, and they are also fragile suffering from
hallucination, refusal and contradiction effects among others. To further
mitigate these, we propose a novel method, DiDOTS. DiDOTS distills knowledge
from LLMs using a teacher-student paradigm and parameter-efficient fine-tuning.
DiDOTS has one order of magnitude fewer parameters compared to its teacher LLM
and can be fine-tuned using three orders of magnitude less parameters compared
to full fine-tuning. Our evaluation shows that compared to prior work DiDOTS
retains the performance of LLMs achieving 1.3x and 2.2x improvement in privacy
performance on two datasets, while humans rate it as better in preserving
utility even when compared to state-of-the-art paraphrasing models.