Trustworthy AI: Safety, Bias, and Privacy

Trustworthy AI: Safety, Bias, and Privacy — A Survey

Authors: Xingli Fang, Jianwei Li, Varun Mulchandani, Jung-Eun Kim | Published: 2025-02-11 | Updated: 2025-06-11

2025.02.112025.06.13

Authors: Xingli Fang, Jianwei Li, Varun Mulchandani, Jung-Eun Kim
Published: 2025-02-11 | Updated: 2025-06-11

Source: https://arxiv.org/abs/2502.10450

PDF: https://arxiv.org/pdf/2502.10450

Labels Predicted by AI

Differential Privacy Bias Prompt leaking

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models’ trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.