As natural language processing methods are increasingly deployed in
real-world scenarios such as healthcare, legal systems, and social science, it
becomes necessary to recognize the role they potentially play in shaping social
biases and stereotypes. Previous work has revealed the presence of social
biases in widely used word embeddings involving gender, race, religion, and
other social constructs. While some methods were proposed to debias these
word-level embeddings, there is a need to perform debiasing at the
sentence-level given the recent shift towards new contextualized sentence
representations such as ELMo and BERT. In this paper, we investigate the
presence of social biases in sentence-level representations and propose a new
method, Sent-Debias, to reduce these biases. We show that Sent-Debias is
effective in removing biases, and at the same time, preserves performance on
sentence-level downstream tasks such as sentiment analysis, linguistic
acceptability, and natural language understanding. We hope that our work will
inspire future research on characterizing and removing social biases from
widely adopted sentence representations for fairer NLP.