Auditing Gender Analyzers
on Text Data
Siddharth D Jaiswal, Ankit Kumar Verma, Animesh Mukherjee
Dept of CSE, Indian Institute of Technology, Kharagpur, India
IEEE/ACM ASONAM 2023
Introduction & Motivation
2
[1] J.Dastin, Amazon scraps secret ai recruiting tool that showed bias against women, Reuters, 2018
[2] S. Mukherjee et al., Gender classification of microblog text based on authorial style, Information Systems and e-Business Management, 2017
[3] Cheng et al., Author gender identification from text, Digital Investigation, 2011
@johndoe
#CNeRGRetreat is going great! Looking forward to more talks!
Gender
Analyzer
> John Doe is … Male
API
How do we evaluate these software?
3
Audit[4]
Code Audit
Scraping Audit
Sockpuppet Audit
Collab. Audit
Used for Face Recognition, E-commerce, Speech Recognition, etc.
User
Researcher
Sock Puppets
Gender Analyzer
Platform
[4] Christian Sandvig et al., Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms, ICA, 2014
Gender Analyzers Evaluated
4
uClassify
Readable
HackerFactor
ChatGPT*
Subscription
Claimed Accuracy
Training
Data
Paid
Paid
Open Source
Licensed
X
X
70%
70%
X
X
Blogs
Wikipedia & Book Corpus
Datasets
5
Tumblr
Shortlist gender specific subreddits-
r/men, r/NonBinary, etc.
Tumblr API
Top Level Comments
+ Flairs
Posts
+ Self-declared Gender
Data Cleaning
Reddit Dataset
Tumblr Dataset
Pushshift API
Shortlist Blogs based on gender info-
Male, Female, Non-binary, etc.
Dataset Statistics
6
# Comments | Avg. Length | # Subreddits |
240k | 99 | 3 |
240k | 99 | 2 |
180k | 80 | 7 |
660k | 93 | 12 |
Male |
Female |
Non-Binary |
Total |
# Posts | Avg. Length | # Blogs |
343k | 99 | 230 |
704k | 141 | 688 |
1.01M | 132 | 1670 |
2.05M | 124 | 2588 |
Experimental Methodology
7
Dataset
uClassify
Readable
HackerFactor
ChatGPT
Analysis
CSV
CSV
CSV
CSV
Rest API
Selenium
Source Code
Prompt
Initial Audit Results on Overall Dataset
8
uClassify
Readable
HackerFactor
All platforms report low accuracy for even the binary gender prediction!
Tumblr
Accuracy %
Initial Audit Results for Male & Female Data
9
uClassify
Readable
HackerFactor
M
Tumblr
M
M
F
F
F
Accuracy %
All platforms report higher accuracy for females on both datasets!
Initial Audit Results on Non-binary Data
uClassify
Readable
HackerFactor
Tumblr
Male
Female
Neutral/
Unknown
All platforms predict “Female” for non-binary authors’ text!
Audit on ChatGPT
11
Male
Female
NonBinary
Accuracy %
ChatGPT performs poorly for Tumblr
Tumblr
A more inclusive Gender Analyzer- BERT*
12
A multi-label BERT-base classifier fine-tuned on data
from both platforms in multiple combinations
BERT Models
13
Dataset (660k) | Model Name |
Reddit-BERT | |
Tumblr | Tumblr-BERT |
Reddit + Tumblr [50%] + [50%] | RT-BERT |
Reddit + Tumblr [f] [k] | RfTk |
Tumblr + Reddit [f] [k] | TfRk |
f : finetuning
k : finetuning further with k-shots
Performance for Male & Female authored text
14
Fine-tuning on Reddit works out better than on Tumblr
Accuracy %
Reddit-BERT
Tumblr-BERT
Tumblr
Performance for Non-binary authored text
15
Male
Female
NonBinary
Reddit-BERT
Tumblr-BERT
Reddit-BERT is a better non-binary predictor than Tumblr-BERT!
Cross Platform Performance
16
Male
Female
NonBinary
Male
Female
NonBinary
Reddit-BERT on Tumblr Comments
Tumblr-BERT on Reddit Comments
Accuracy : 37%
Accuracy : 38%
Accuracy %
Male
Female
NonBinary
Tumblr-BERT is a better predictor on Reddit data!
Fine-tuning on mixed dataset
17
Male
Female
NonBinary
RT-BERT on Reddit Comments
RT-BERT on Tumblr Comments
Accuracy : 79%
Accuracy : 69%
Accuracy %
RT-BERT performs better on both the dataset!
NonBinary
Male
Female
Male
Female
NonBinary
Few shot learning with limited examples
18
73%
90%
Accuracy %
k
TfRk is the best performing model on the Reddit dataset!
Final Takeaways
19
@siddsjaiswal
Thank You!
@Animesh43061078
siddsjaiswal@kgpian.iitkgp.ac.in
animeshm@cse.iitkgp.ac.in
ankitverma@kgpian.iitkgp.ac.in
@ankitverma5859
20