Paper

Paper
edit
description
id	Vol-3194/paper65
wikidataid	Q117344899→Q117344899
title	Bias Score: Estimating Gender Bias in Sentence Representations
pdfUrl	https://ceur-ws.org/Vol-3194/paper65.pdf
dblpUrl	https://dblp.org/rec/conf/sebd/AzzaliniDT22
volume	Vol-3194→Vol-3194
session	→

Bias Score: Estimating Gender Bias in Sentence Representations

Bias Score: Estimating Gender Bias in Sentence
Representations
(Discussion Paper)

Fabio Azzalini1,2 , Tommaso Dolci1 and Mara Tanelli1
1
Politecnico di Milano – Dipartimento di Elettronica, Informazione e Bioingegneria
2
Human Technopole – Center for Analysis, Decisions and Society

Abstract
The ever-increasing number of applications based on semantic text analysis is making natural language
understanding a fundamental task. Language models are used for a variety of tasks, such as parsing
CVs or improving web search results. At the same time, concern is growing around embedding-based
language models, which often exhibit social bias and lack of transparency, despite their popularity and
widespread use. Word embeddings in particular exhibit a large amount of gender bias, and they have
been shown to reflect social stereotypes. Recently, sentence embeddings have been introduced as a
novel and powerful technique to represent entire sentences as vectors. However, traditional methods for
estimating gender bias cannot be applied to sentence representations, because gender-neutral entities
cannot be easily identified and listed. We propose a new metric to estimate gender bias in sentence
embeddings, named bias score. Our solution, leveraging the semantic importance of individual words
and previous research on gender bias in word embeddings, is able to discern between correct and biased
gender information at sentence level. Experiments on a real-world dataset demonstrates that our novel
metric identifies gender stereotyped sentences.

Keywords
Gender bias, natural language processing, computer ethics

1. Introduction
Language models are used for a variety of downstream applications, such as CV parsing for a
job position, or detecting sexist comments on social networks. Recently, a big step forward in
the field of natural language processing (NLP) was the introduction of language models based
on word embeddings, i.e. representations of words as vectors in a multi-dimensional space.
These models translate the semantics of words into geometric properties, so that terms with
similar meanings tend to have their vectors close to each other, and the difference between two
embeddings represents the relationship between their respective words [1]. For instance, it is
possible to retrieve the analogy 𝑚𝑎𝑛 : 𝑘𝑖𝑛𝑔 = 𝑤𝑜𝑚𝑎𝑛 : 𝑞𝑢𝑒𝑒𝑛 because the difference vectors
− −−→ − −
𝑞𝑢𝑒𝑒𝑛
−→
𝑘𝑖𝑛𝑔 and − −−−−→ − −
𝑤𝑜𝑚𝑎𝑛 −→ share approximately the same direction.
𝑚𝑎𝑛
Word embeddings boosted results in many NLP tasks, like sentiment analysis and question
answering. However, despite the growing hype around them, these models have been shown
SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ fabio.azzalini@polimi.it (F. Azzalini); tommaso.dolci@polimi.it (T. Dolci); mara.tanelli@polimi.it (M. Tanelli)
� 0000-0003-0631-2120 (F. Azzalini); 0000-0002-7172-0203 (M. Tanelli)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
�to reflect the stereotypes of the Western society, even when the training phase is performed
over text corpora written by professionals, such as news articles. For instance, they return
sexist analogies like 𝑚𝑎𝑛 : 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑒𝑟 = 𝑤𝑜𝑚𝑎𝑛 : ℎ𝑜𝑚𝑒𝑚𝑎𝑘𝑒𝑟 [2]. The social bias in the
geometry of the model is reflected in downstream applications like web search or CV parsing. In
turn, this phenomenon favours prejudice towards social categories already frequently penalised,
such as women or African Americans.
Lately, sentence embeddings – vector representations of sentences based on word embeddings
– are also increasing in popularity, improving results in many language understanding tasks, such
as semantic similarity or sentiment prediction [3, 4]. Therefore, it is of the utmost importance
to expand the research to understand how language models perceive the semantics of natural
language when computing the respective embedding. A very interesting step in this direction
is to define metrics to estimate social bias in sentence embeddings.
This work expands research on social bias in embedding-based models, focusing on gender
bias in sentence representations. We propose a method to estimate gender bias in sentence
embeddings and perform our experiments on InferSent, a sentence encoder designed by Facebook
AI [3] based on GloVe1 , a very popular word embedding model. Our solution, named bias score,
is highly flexible and can be adapted to both different kinds of social bias and different language
models. Bias score will help in researching procedures like debiasing embeddings, that require
to identify biased embeddings and therefore to estimate the amount of bias they contain [2].
Similarly, techniques for improving training datasets also require to evaluate all sentences
contained in them, to identify problematic entries to remove, change, or compensate for.

2. State of the Art
Although language models are successfully used in a variety of applications, bias and fairness in
NLP have received relatively little consideration until recent times, running the risk of favouring
prejudice and strengthening stereotypes [5].

2.1. Bias in Word Embeddings
Static word embeddings were the first to be analysed. In 2016, they have been shown to exhibit
the so-called gender bias, defined as the cosine of the angle between the word embedding of a
gender-neutral word, and a one-dimensional subspace representing gender [2]. The approach
was later adapted for non-binary social biases such as racial and religious bias [6]. A debiasing
algorithm was also proposed to mitigate gender bias in word embeddings [2], however it was
also shown that it fails to entirely capture and remove it [7]. The Word Embedding Association
Test (WEAT) [8] was created to measure bias in word embeddings following the pattern of the
implicit-association test for humans. WEAT demonstrated the presence of harmful associations
in GloVe and word2vec2 embeddings. More recently, contextualised word embeddings like
BERT [9] proved to be very accurate language models. However, despite literature suggesting

1
https://nlp.stanford.edu/projects/glove/
2
https://code.google.com/archive/p/word2vec/
�that they are generally less biased compared to their static counterparts [10], they still display a
significant amount of social bias [11].

2.2. Bias in Sentence Representations
Research is quite lacking regarding sentence-level representations. WEAT was extended to
measure bias in sentence embedding encoders: the Sentence Encoder Association Test (SEAT)
is again based on the evaluation of implicit associations and showed that modern sentence
embeddings also exhibit social bias [11]. Attempts at debiasing sentence embeddings faced
the issue of not being able to recognise neutral sentences, thus debiasing every representation
regardless of the gender attributes in the original natural language sentence [12].

3. Methodology
As already mentioned, gender bias in word embeddings is estimated using the cosine similarity
between word vectors and a gender direction identified in the vector space [2]. Cosine similarity
is a popular metric to compute the semantic similarity of words based on the angle between
their embedding vectors. Given two word vectors ⃗𝑢 and ⃗𝑣 , cosine similarity is expressed as:
⃗𝑢 · ⃗𝑣
cos(𝑢
⃗ , ⃗𝑣 ) = cos(𝜃) = ,
‖𝑢
⃗ ‖ ‖𝑣⃗‖
where 𝜃 is the angle between ⃗𝑢 and ⃗𝑣 . The more cos(𝜃) approaches 1, the higher is the semantic
similarity between ⃗𝑢 and ⃗𝑣 . In word embedding models, similarity with respect to the gender
direction means that a word vector contains information about gender. Since only gender-
neutral words can be biased, gendered words like man or woman are assumed to contain correct
gender information.
When it comes to sentence representations, the main problem is that gender-neutral sentences
cannot be easily identified and listed like words, because sentences are infinite in number.
Moreover, they may contain gender bias despite their being gendered. Consider the sentence
my mother is a nurse: the word mother contains correct gender semantics, but the word nurse
is female stereotyped. Table 1 shows that the gender-neutral sentence someone is a nurse still
contains a lot of gender information due to the bias associated with the word nurse.
Therefore, it is important to distinguish between the amount of encoded gender information
coming from gendered words, and the amount coming from biased words. For this reason, we
adopt a more dynamic approach: we keep working at the word level, using the cosine similarity
between neutral word representations, and the gender direction to estimate word-level gender
bias. Then, we sum the bias of all the words in the sentence, adjusted according to the length of
the sentence and to the contextualised importance of each word. This decision is grounded on
two observations: first, the semantics of a sentence depends largely on the semantics of the
words contained in it; second, sentence embedding encoders are based on previously defined
word embedding models [3, 4]. We focus our research on InferSent by Facebook AI [3], a
sentence encoder that achieved great results in many different downstream tasks [13]. InferSent
encodes sentence representations starting from GloVe [14] word embeddings. Therefore, we
use GloVe for the first step of quantifying gender bias at the word level.
�Table 1
Gender information with cosine similarity for sentence embeddings by InferSent [3] and SBERT [4].
input InferSent SBERT neutral? biased?
my mother is there 0.28877 0.46506 no no
my mother is a nurse 0.29852 0.46018 no yes
someone is a nurse 0.18175 0.43965 yes yes

3.1. Gender Bias Estimation
To estimate gender bias in sentence representations, we consider four elements:

• cos(𝑥⃗ , ⃗𝑦 ): cosine similarity between two word vectors ⃗𝑥 and ⃗𝑦 ,
• 𝐷: gender direction identified in the vector space,
• 𝐿: list of gendered words in English,
• 𝐼𝑤 : a percentage estimating the semantic importance of a word in the sentence.

Our metric, named bias score, takes a sentence as input, and returns two indicators corresponding
to the amount of female and male bias at sentence level. Respectively, they are a positive and a
negative value, obtained from the sum of the gender bias of all words, estimated from cosine
similarity with respect to the gender direction. Since gender bias is a characteristic of gender-
neutral words, gendered terms are excluded from the computation and instead their bias is
always set to zero. In detail, for each neutral word 𝑤 in the sentence we compute its gender
bias as the cosine similarity between its word vector 𝑒𝑚𝑏𝑤 and the gender direction 𝐷, and
then we multiply it by the word importance 𝐼𝑤 . In particular, for a given sentence 𝑠:
∑︁
𝐵𝑖𝑎𝑠𝑆𝑐𝑜𝑟𝑒𝐹 (𝑠) = cos(𝑒𝑚𝑏𝑤 , 𝐷) ×𝐼𝑤
𝑤∈𝑠
⏟ ⏞
𝑤∈𝐿
/ >0

∑︁
𝐵𝑖𝑎𝑠𝑆𝑐𝑜𝑟𝑒𝑀 (𝑠) = cos(𝑒𝑚𝑏𝑤 , 𝐷) ×𝐼𝑤
𝑤∈𝑠
⏟ ⏞
𝑤∈𝐿
/ <0

Notice that, for each word 𝑤 that is gender-neutral, 𝑤 ∈
/ 𝐿. Also, word importance 𝐼𝑤 is always
a positive number, and the cosine similarity can be either positive or negative. Therefore, bias
score keeps the estimations of gender bias towards the male and female directions separated. In
the following sections, we go into more detail by illustrating how we derive 𝐷, 𝐿 and 𝐼𝑤 .

3.2. Gender Direction
The first step of our method is to identify in the vector space a single dimension comprising
the majority of the gender semantics of the model. The resulting dimension, named gender
direction, serves as the first term in the cosine similarity function, to establish the amount of
gender semantics encoded in a vector for a given word, according to the model.
GloVe [14], the word embedding model that we use, is characterised by a vector space of 300
dimensions. Inside the vector space, the difference between two embeddings returns the direction
−→ −
→ −→ − →
that connects them. In the case of the embeddings 𝑠ℎ𝑒 and ℎ𝑒, their difference vector 𝑠ℎ𝑒 − ℎ𝑒
�represents a one-dimensional subspace that identifies gender in GloVe. However, also the
difference vector −−−−−→ − −
𝑤𝑜𝑚𝑎𝑛 −→ identifies gender, yet it represents a slightly different subspace
𝑚𝑎𝑛
−→ − →
compared to 𝑠ℎ𝑒 − ℎ𝑒. Therefore, following the approach in [2], we take into consideration
several pairs of gendered words and perform a Principal Component Analysis (PCA) to reduce
the dimensionality. We use ten pairs of gendered words: woman–man, girl–boy, she–he, mother–
father, daughter–son, gal–guy, female–male, her–his, herself–himself, Mary–John.
As shown in Fig. 1, the top component resulting from the analysis is significantly more
important than the other components, explaining almost 60% of the variance. We use this
top component as gender direction, and we observe that embeddings of female words have a
positive cosine with respect to it, whereas for male words we have a negative cosine.

3.3. Gendered Words
A list 𝐿 of gendered words is fundamental to estimate gender bias, because only gender-neutral
entities can be biased. Since the number of elements in the subset 𝒩 of gender-neutral words
in the vocabulary of a language is very big, while the subset 𝒢 of gendered words is relatively
small (especially in the case of the English language), we derive 𝒩 as the difference between
the complete vocabulary of the language 𝒱 and the subset 𝒢 of gendered words: 𝒩 = 𝒱 ∖ 𝒢.
To achieve this, we define a list 𝐿 of words containing as many of the elements of the subset 𝒢
as possible. Therefore, gender bias is estimated for all elements 𝑤𝑛 in the subset 𝒩 (neutral
words), whereas for all elements 𝑤𝑔 in the subset 𝒢 (gendered words) the gender bias is always
set to zero:
∀ 𝑤𝑛 ∈ 𝒩 , 𝑏𝑖𝑎𝑠(𝑤𝑛 ) ̸= 0
∀ 𝑤𝑔 ∈ 𝒢, 𝑏𝑖𝑎𝑠(𝑤𝑔 ) = 0
For this reason, all the elements from 𝐿 are not considered when estimating gender bias. As a
matter of fact, we consider the gender information encoded in their word embeddings to be
always correct. Examples of gendered words include he, she, sister, girl, father, man.
Our list 𝐿 contains a total of 6562 gendered nouns, of which 409 and 388 are respectively
lower-cased and capitalised common nouns taken from [2] and [15]. Additionally, we added
5765 unique given names taken from Social Security card applications in the United States3 .

3.4. Word Importance
Following the approach in [3], word importance is estimated based on the max-pooling operation
performed by the sentence encoder (in our case InferSent) using all vectors representing the
words in a given sentence. The procedure counts how many times each word representation is
selected by the sentence encoder during the max-pooling phase. In particular, we count the
number of times that the max-pooling procedure selects the hidden state ℎ𝑡 , for each time step
𝑡 in the neural network underlying the language model, with 𝑡 ∈ [0, . . . , 𝑇 ] and 𝑇 equal to the
number of words in the sentence. Note that ℎ𝑡 can be seen as a sentence representation centred
on the word 𝑤𝑡 , i.e. the word at position 𝑡 in the sentence.

3
https://www.kaggle.com/datagov/usa-names
� 0.6

25
0.5

20
0.4

15

%
0.3

0.2
10

0.1
5

0.0
0
0 2 4 6 8

A

an

is

g

e

.
th

on
yin
m

ph
pla

xo
sa
Figure 1: Top ten components in PCA. Figure 2: Example of word importance.

We consider both the absolute importance of each word, and the percentage with respect to the
total absolute importance of all the words in the sentence. For instance, in the example of Fig. 2,
the absolute importance of the word saxophone is 1106, meaning that its vector representation
is selected by the max-pooling procedure for 1106 dimensions out of the total 4096 dimensions
of the sentence embeddings computed by InferSent. The percentage importance is 1106 4096 ≈ 0.27,
meaning that the word counts for around 27% of the semantics of the sentence. In particular,
the percentage importance is also independent of the length of the sentence, despite the fact
that very long sentences generally have a more distributed semantics. For this reason, we use
the percentage importance to compute bias score.

3.5. Variant
Bias score enables to discern gender bias towards the female and male directions. However, we
can also take the absolute value of each word-level bias to derive a single estimation of gender
bias at sentence level:
∑︁
𝐴𝑏𝑠-𝐵𝑖𝑎𝑠𝑆𝑐𝑜𝑟𝑒(𝑠) = | cos(𝑒𝑚𝑏𝑤 , 𝐷) × 𝐼𝑤 |
𝑤∈𝑠
⏟ ⏞
𝑤∈𝐿
/ word-level bias

4. Experimental Results
Table 2 illustrate an example of gender bias estimation via bias score, showing that gender
stereotyped concepts like wearing pink dresses are heavily internalised in the final sentence rep-
resentation. Additionally, we used bias score to estimate gender bias for sentences contained in
the Stanford Natural Language Inference (SNLI) corpus, a large collection of human-written En-
glish sentences for classification training [16]. SNLI contains more than 570k pairs of sentences,
and more than 600k unique sentences in the train set alone. According to our experiments,
sentences corresponding to the highest bias score towards the male direction describe situations
from popular sports like baseball and football, that are frequently associated with men and very
seldom with women. Similarly, sentences corresponding to the highest bias score in the female
direction illustrate female stereotypes, like participating in beauty pageants, applying make-up
or working as a nurse. Table 3 displays the most-biased SNLI sentences according to our metric.
Results are similar when estimating the absolute bias score. In particular, entries associated to
�Table 2
Detailed bias score estimation for the sentence She likes the pink dress.
word importance gender bias weighted bias
She 12.13% 0.00000 0.00000
likes 17.48% -0.05719 -0.01000
the 8.35% -0.10195 -0.00851
new 14.70% -0.00051 -0.00008
pink 12.84% 0.25705 0.03301
dress 14.87% 0.28579 0.04249
overall female bias 0.07550
overall male bias -0.01858

Table 3
Highest bias scores for sentences in SNLI train set, towards the female and male directions.
sentence bias score
Beauty pageant wearing black clothing 0.134793
Middle-aged blonde hula hooping. 0.127903
A blonde child is wearing a pink bikini. 0.125145
A showgirl is applying makeup. 0.123312
Football players scoring touchdowns -0.149844
Football players playing defense. -0.140169
A defensive player almost intercepted the football from the quarterback. -0.139420
Baseball players -0.138058

the highest absolute bias score include sentences with a high bias score in either the female
or male direction, like football players scoring touchdowns or the bikini is pink. Additionally,
sexualised sentences like the pregnant sexy volleyball player is hitting the ball are also present.

5. Conclusion and Future Work
In this paper we proposed an algorithm to estimate gender bias in sentence embeddings, based on
a novel metric named bias score. We discern between gender bias and correct gender information
encoded in a sentence embedding, and weigh bias on the basis of semantic importance of each
word. We tested our solution on InferSent [3], searching for gender biased representations in a
corpus of natural language sentences. Since gender bias has been proven to be caused by the
internalisation of gender stereotypical associations [16], our algorithm for estimating bias score
allows to identify whose vector representation encapsulates stereotypes the most.
Future work will include adapting the proposed solution to different language models and
different kinds of social bias. Additionally, since bias score allows to identify stereotypical
entries in natural language corpora used for training language models, removing or substituting
such entries may improve the fairness of the corpus. Thus, future work also includes re-training
language models using text corpora made fairer with such procedure, and a comparison with
the original models both from the quality and the fairness point of view.
�Acknowledgments
We are grateful to our mentor Letizia Tanca for her advice in the writing phase of this work.

References
[1] T. Mikolov, W.-T. Yih, G. Zweig, Linguistic regularities in continuous space word repre-
sentations, in: HLT-NAACL, 2013.
[2] T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, A. Kalai, Man is to computer programmer
as woman is to homemaker? debiasing word embeddings, arXiv preprint arXiv:1607.06520
(2016).
[3] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, Supervised learning of uni-
versal sentence representations from natural language inference data, arXiv preprint
arXiv:1705.02364 (2017).
[4] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
arXiv preprint arXiv:1908.10084 (2019).
[5] K.-W. Chang, V. Prabhakaran, V. Ordonez, Bias and fairness in natural language processing,
in: EMNLP-IJCNLP: Tutorial Abstracts, 2019.
[6] T. Manzini, Y. C. Lim, Y. Tsvetkov, A. W. Black, Black is to criminal as caucasian is to
police: Detecting and removing multiclass bias in word embeddings, arXiv preprint
arXiv:1904.04047 (2019).
[7] H. Gonen, Y. Goldberg, Lipstick on a pig: Debiasing methods cover up systematic gender
biases in word embeddings but do not remove them, arXiv preprint arXiv:1903.03862
(2019).
[8] A. Caliskan, J. J. Bryson, A. Narayanan, Semantics derived automatically from language
corpora contain human-like biases, Science 356 (2017) 183–186.
[9] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[10] C. Basta, M. R. Costa-Jussà, N. Casas, Evaluating the underlying gender bias in contextual-
ized word embeddings, arXiv preprint arXiv:1904.08783 (2019).
[11] C. May, A. Wang, S. Bordia, S. R. Bowman, R. Rudinger, On measuring social biases in
sentence encoders, arXiv preprint arXiv:1903.10561 (2019).
[12] P. P. Liang, I. M. Li, E. Zheng, Y. C. Lim, R. Salakhutdinov, L.-P. Morency, Towards debiasing
sentence representations, arXiv preprint arXiv:2007.08100 (2020).
[13] A. Conneau, D. Kiela, Senteval: An evaluation toolkit for universal sentence representa-
tions, arXiv preprint arXiv:1803.05449 (2018).
[14] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation,
in: EMNLP, 2014, pp. 1532–1543.
[15] J. Zhao, Y. Zhou, Z. Li, W. Wang, K.-W. Chang, Learning gender-neutral word embeddings,
arXiv preprint arXiv:1809.01496 (2018).
[16] S. R. Bowman, G. Angeli, C. Potts, C. D. Manning, A large annotated corpus for learning
natural language inference, arXiv preprint arXiv:1508.05326 (2015).
�

Vol-3194/paper65

Paper

Bias Score: Estimating Gender Bias in Sentence Representations

Navigation menu

Search