1 of 20

Auditing Gender Analyzers

on Text Data

Siddharth D Jaiswal, Ankit Kumar Verma, Animesh Mukherjee

Dept of CSE, Indian Institute of Technology, Kharagpur, India

IEEE/ACM ASONAM 2023

2 of 20

Introduction & Motivation

  • Gender Analyzers- Predict Gender of an individual from input
    • In this work- Text based Gender Analyzers
  • Use Cases: Recruitment[1], Advertising[2], Safety on social media[3], etc.
  • Existing Tools - only binary (M/F) gender labels
    • Non-binary and other gender groups?

2

[1] J.Dastin, Amazon scraps secret ai recruiting tool that showed bias against women, Reuters, 2018

[2] S. Mukherjee et al., Gender classification of microblog text based on authorial style, Information Systems and e-Business Management, 2017

[3] Cheng et al., Author gender identification from text, Digital Investigation, 2011

@johndoe

#CNeRGRetreat is going great! Looking forward to more talks!

Gender

Analyzer

> John Doe is … Male

API

3 of 20

How do we evaluate these software?

3

Audit[4]

Code Audit

Scraping Audit

Sockpuppet Audit

Collab. Audit

Used for Face Recognition, E-commerce, Speech Recognition, etc.

User

Researcher

Sock Puppets

Gender Analyzer

Platform

[4] Christian Sandvig et al., Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms, ICA, 2014​

4 of 20

Gender Analyzers Evaluated

4

uClassify

Readable

HackerFactor

ChatGPT*

Subscription

Claimed Accuracy

Training

Data

Paid

Paid

Open Source

Licensed

X

X

70%

70%

X

X

Blogs

Wikipedia & Book Corpus

5 of 20

Datasets

5

Reddit

Tumblr

Shortlist gender specific subreddits-

r/men, r/NonBinary, etc.

Tumblr API

Top Level Comments

+ Flairs

Posts

+ Self-declared Gender

Data Cleaning

Reddit Dataset

Tumblr Dataset

Pushshift API

Shortlist Blogs based on gender info-

Male, Female, Non-binary, etc.

6 of 20

Dataset Statistics

6

# Comments

Avg. Length

# Subreddits

240k

99

3

240k

99

2

180k

80

7

660k

93

12

Male

Female

Non-Binary

Total

# Posts

Avg. Length

# Blogs

343k

99

230

704k

141

688

1.01M

132

1670

2.05M

124

2588

7 of 20

Experimental Methodology

7

Dataset

uClassify

Readable

HackerFactor

ChatGPT

Analysis

CSV

CSV

CSV

CSV

Rest API

Selenium

Source Code

Prompt

8 of 20

Initial Audit Results on Overall Dataset

8

uClassify

Readable

HackerFactor

All platforms report low accuracy for even the binary gender prediction!

Reddit

Tumblr

Accuracy %

9 of 20

Initial Audit Results for Male & Female Data

9

uClassify

Readable

HackerFactor

M

Reddit

Tumblr

M

M

F

F

F

Accuracy %

All platforms report higher accuracy for females on both datasets!

10 of 20

Initial Audit Results on Non-binary Data

uClassify

Readable

HackerFactor

Reddit

Tumblr

Male

Female

Neutral/

Unknown

All platforms predict “Female” for non-binary authors’ text!

11 of 20

Audit on ChatGPT

11

Male

Female

NonBinary

Accuracy %

ChatGPT performs poorly for Tumblr

Reddit

Tumblr

12 of 20

A more inclusive Gender Analyzer- BERT*

  • Existing platforms are inaccurate AND biased!
    • Leverage pre-trained Language Models?�
  • Fine-tune a BERT-base model on our dataset
    • Both datasets already have Non-binary gender labels!

12

A multi-label BERT-base classifier fine-tuned on data

from both platforms in multiple combinations

13 of 20

BERT Models

13

Dataset (660k)

Model Name

Reddit

Reddit-BERT

Tumblr

Tumblr-BERT

Reddit + Tumblr

[50%] + [50%]

RT-BERT

Reddit + Tumblr

[f] [k]

RfTk

Tumblr + Reddit

[f] [k]

TfRk

f : finetuning

k : finetuning further with k-shots

14 of 20

Performance for Male & Female authored text

14

Fine-tuning on Reddit works out better than on Tumblr

Accuracy %

Reddit-BERT

Tumblr-BERT

Reddit

Tumblr

15 of 20

Performance for Non-binary authored text

15

Male

Female

NonBinary

Reddit-BERT

Tumblr-BERT

Reddit-BERT is a better non-binary predictor than Tumblr-BERT!

16 of 20

Cross Platform Performance

16

Male

Female

NonBinary

Male

Female

NonBinary

Reddit-BERT on Tumblr Comments

Tumblr-BERT on Reddit Comments

Accuracy : 37%

Accuracy : 38%

Accuracy %

Male

Female

NonBinary

Tumblr-BERT is a better predictor on Reddit data!

17 of 20

Fine-tuning on mixed dataset

17

Male

Female

NonBinary

RT-BERT on Reddit Comments

RT-BERT on Tumblr Comments

Accuracy : 79%

Accuracy : 69%

Accuracy %

RT-BERT performs better on both the dataset!

NonBinary

Male

Female

Male

Female

NonBinary

18 of 20

Few shot learning with limited examples

18

73%

90%

Accuracy %

k

TfRk is the best performing model on the Reddit dataset!

19 of 20

Final Takeaways

19

  • The audited gender analyzers performed poorly on the binary gender comments/blog posts.

  • The audited gender analyzers predicted “female” for a majority of the non-binary comments/blog posts.

  • LLMs fine-tuned on inclusive, labelled data can be used to create a more inclusive Gender Analyzer.

  • Pre-trained LLMs can also be generalized with transfer learning.

20 of 20

@siddsjaiswal

Thank You!

@Animesh43061078

siddsjaiswal@kgpian.iitkgp.ac.in

animeshm@cse.iitkgp.ac.in

ankitverma@kgpian.iitkgp.ac.in

@ankitverma5859

20