1 of 1

Conditional Synthetic Image Generation for Reducing

Algorithmic Bias in Imbalanced Datasets

Shikhar Gupta, Joseph Thomas, Aadrij Upadya, Mihika Deshpande, and Pranav Singh

Computer Science & Engineering Department, Mui Group, at ASDRP

AI increasingly performing sensitive facial recognition tasks: suspect identification, gender classification, race classification, and more
Prone to significant bias → 20.8% and 34.4% lower accuracies for black females compared to white males in Microsoft and IBM gender classifiers, respectively [1]

Often due to underrepresentation of minority groups in datasets

Objective is to build generative models capable of creating synthetic images of faces to balance datasets

Trained variational autoencoder (VAE) and generative adversarial network (GAN) on UTKFace dataset

Evaluated approach using a convolutional neural network (CNN) trained with real and synthetic faces to predict gender
Compared results to simple oversampling via duplication and DB-VAE, a state-of-the-art debiasing solution, using fairness metrics [2]
Achieved lower variation in subgroup (e.g. Asian male) accuracies while maintaining accuracy, reducing bias

Converted images to NumPy arrays with shape (3, 64, 64)
Built VAE architecture trained with 3 conditions: age, gender, and race

Changed age value to between 0 and 8 to reduce complexity

Generated 8,410 images of non-white faces and randomized gender and age to balance the distribution

Built GAN with generator and discriminator networks
Selected generated images with feature vectors matching attributes of minority groups
Sampled from latent distribution to produce another 8,410 images

Performed 5 tests:
Used sampled 4,187 images with imbalanced distribution to train CNN to predict gender
Applied random oversampling by adding 8,410 duplicated images of minority groups and retraining the CNN
Implemented available DB-VAE network that re-weights data points of minority groups using learned latent structure
Trained and tested CNN with real and fake VAE faces
Trained and tested CNN with real and fake GAN faces

Accuracy Scores Per Subgroup

Cropped images from the UTKFace dataset used for generation - 23,709 samples
Randomly sampled 5,000 images for training a CNN that predicts gender
Increased severity of imbalance between race and gender subgroups to develop an initially biased classifier

In this research, we show that using deep learning methods to generate synthetic images to balance imbalanced facial datasets can improve performance on minority classes significantly
As much as 32% percent increase in intraclass performance
Comparable/better results than modern debiasing algorithms (DB-VAE, random oversampling)
Allows expansion of dataset while avoiding addition of noisy data that can confuse model
Future goals:

Experiment with more complex networks such as VAE-GANs and tune hyperparameters
Use image enhancing in order to improve the quality and diversity of generated images
Leverage unsupervised learning (e.g. clustering) to find minority groups of images without provided attributes and features
Implement model in completing other tasks such as multiclass classification (race, age, etc.)

Abstract

Data

Methodology

Results

Conclusion/Future Work

References

Introduction

[1] Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77–91.

[2] Alexander Amini, Ava P. Soleimany, Wilko Schwarting, Sangeeta N. Bhatia, and Daniela Rus. 2019. Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure. In 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES’19), January 27–28, 2019, Honolulu, HI, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3306618.331424

[3] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer, 2020, “SMOTE: Synthetic Minority over-sampling technique”, Journal of Artificial Intelligence Research vol. 16, pp. 321-357

[4] V. Sampath., I. Maurtua, Aguilar Martin, J. J., & Gutierrez, A. (2021, January 29). A survey on generative adversarial networks for imbalance problems in Computer Vision Tasks - Journal of Big Data. SpringerOpen. Retrieved August 16, 2022, from https://doi.org/10.1186/s40537-021-00414-0

[5] Diederik P. Kingma and Max Welling (2019), “An Introduction to Variational Autoencoders”, Foundations and Trends in Machine Learning: Vol. xx, No. xx, pp 1–18. DOI: 10.1561/XXXXXXXXX.

[6] Ian Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Ben- gio, “Generative adversarial nets,” Advances in Neu- ral Information Processing Systems, pp. 2672–2680, 2014.

Acknowledgements

Data bias in face datasets has caused many facial recognition algorithms to perform poorly on racial and gender subgroups
Research shown face detection software from companies such as Amazon, Facebook, IBM, and Microsoft are biased against people of color or women [1]

Up to 34.4% of a difference between white male and black female accuracies

Dataset imbalance is a common cause for bias → current techniques, including oversampling and downsampling, are aimed at restoring balance [2], [3]

Often risk of losing data or overfitting if underrepresented classes are duplicated repeatedly

In recent years, new unsupervised generative techniques such as VAEs and GANs have proven successful in creating realistic images given a set of data [4]

Known for use in medical settings [4]

Current research has explored the effectiveness of these models, though measuring their ability to reduce bias with robust fairness metrics has yet to be investigated

We would like to thank Dr. Mui for his guidance and the Aspiring Scholars Directed Research Program (ASDRP) for providing us the opportunity to conduct artificial intelligence research this year.

Black male

Asian female

Black male

Indian female

Underrepresentation → bias against women and people of color
Higher differences in gender accuracy for non-white faces

Debiasing VAE (DB-VAE) → decreased gap in accuracies between subgroups but caused a ~10% drop in overall accuracy

Duplication → overfitting on minority images
Increased gap in accuracy between male and female classes

Balancing with VAEs → decreased gap in accuracies between male and female classes with minimal drop in overall accuracy

Balancing with GANs → largely similar, though performance increased for black and Indian females

	Standard Deviation
Imbalanced Data	17.67
Oversampled Data	31.94
Data with VAE Faces	6.92
Data with GAN Faces	11.91
DB-VAE	11.28

Calculated standard deviation for the subgroup accuracies in each test

Lower value indicates less bias

Template ID: inquisitalanchor Size: 48x36