1 of 1

Conditional Synthetic Image Generation for Reducing

Algorithmic Bias in Imbalanced Datasets

Shikhar Gupta, Joseph Thomas, Aadrij Upadya, Mihika Deshpande, and Pranav Singh

Computer Science & Engineering Department, Mui Group, at ASDRP

  • AI increasingly performing sensitive facial recognition tasks: suspect identification, gender classification, race classification, and more
  • Prone to significant bias → 20.8% and 34.4% lower accuracies for black females compared to white males in Microsoft and IBM gender classifiers, respectively [1]
    • Often due to underrepresentation of minority groups in datasets
  • Objective is to build generative models capable of creating synthetic images of faces to balance datasets
    • Trained variational autoencoder (VAE) and generative adversarial network (GAN) on UTKFace dataset
  • Evaluated approach using a convolutional neural network (CNN) trained with real and synthetic faces to predict gender
  • Compared results to simple oversampling via duplication and DB-VAE, a state-of-the-art debiasing solution, using fairness metrics [2]
  • Achieved lower variation in subgroup (e.g. Asian male) accuracies while maintaining accuracy, reducing bias
  • Converted images to NumPy arrays with shape (3, 64, 64)
  • Built VAE architecture trained with 3 conditions: age, gender, and race
    • Changed age value to between 0 and 8 to reduce complexity
  • Generated 8,410 images of non-white faces and randomized gender and age to balance the distribution

  • Built GAN with generator and discriminator networks
  • Selected generated images with feature vectors matching attributes of minority groups
  • Sampled from latent distribution to produce another 8,410 images

  • Performed 5 tests:
  • Used sampled 4,187 images with imbalanced distribution to train CNN to predict gender
  • Applied random oversampling by adding 8,410 duplicated images of minority groups and retraining the CNN
  • Implemented available DB-VAE network that re-weights data points of minority groups using learned latent structure
  • Trained and tested CNN with real and fake VAE faces
  • Trained and tested CNN with real and fake GAN faces

Accuracy Scores Per Subgroup

  • Cropped images from the UTKFace dataset used for generation - 23,709 samples
  • Randomly sampled 5,000 images for training a CNN that predicts gender
  • Increased severity of imbalance between race and gender subgroups to develop an initially biased classifier

  • In this research, we show that using deep learning methods to generate synthetic images to balance imbalanced facial datasets can improve performance on minority classes significantly
  • As much as 32% percent increase in intraclass performance
  • Comparable/better results than modern debiasing algorithms (DB-VAE, random oversampling)
  • Allows expansion of dataset while avoiding addition of noisy data that can confuse model
  • Future goals:
    • Experiment with more complex networks such as VAE-GANs and tune hyperparameters
    • Use image enhancing in order to improve the quality and diversity of generated images
    • Leverage unsupervised learning (e.g. clustering) to find minority groups of images without provided attributes and features
    • Implement model in completing other tasks such as multiclass classification (race, age, etc.)

Abstract

Data

Methodology

Results

Conclusion/Future Work

References

Introduction

[1] Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77–91.

[2] Alexander Amini, Ava P. Soleimany, Wilko Schwarting, Sangeeta N. Bhatia, and Daniela Rus. 2019. Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure. In 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES’19), January 27–28, 2019, Honolulu, HI, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3306618.331424

[3] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer, 2020, “SMOTE: Synthetic Minority over-sampling technique”, Journal of Artificial Intelligence Research vol. 16, pp. 321-357

[4] V. Sampath., I. Maurtua, Aguilar Martin, J. J., & Gutierrez, A. (2021, January 29). A survey on generative adversarial networks for imbalance problems in Computer Vision Tasks - Journal of Big Data. SpringerOpen. Retrieved August 16, 2022, from https://doi.org/10.1186/s40537-021-00414-0

[5] Diederik P. Kingma and Max Welling (2019), “An Introduction to Variational Autoencoders”, Foundations and Trends in Machine Learning: Vol. xx, No. xx, pp 1–18. DOI: 10.1561/XXXXXXXXX.

[6] Ian Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Ben- gio, “Generative adversarial nets,” Advances in Neu- ral Information Processing Systems, pp. 2672–2680, 2014.

Acknowledgements

  • Data bias in face datasets has caused many facial recognition algorithms to perform poorly on racial and gender subgroups
  • Research shown face detection software from companies such as Amazon, Facebook, IBM, and Microsoft are biased against people of color or women [1]
    • Up to 34.4% of a difference between white male and black female accuracies
  • Dataset imbalance is a common cause for bias → current techniques, including oversampling and downsampling, are aimed at restoring balance [2], [3]
    • Often risk of losing data or overfitting if underrepresented classes are duplicated repeatedly
  • In recent years, new unsupervised generative techniques such as VAEs and GANs have proven successful in creating realistic images given a set of data [4]
    • Known for use in medical settings [4]
  • Current research has explored the effectiveness of these models, though measuring their ability to reduce bias with robust fairness metrics has yet to be investigated

We would like to thank Dr. Mui for his guidance and the Aspiring Scholars Directed Research Program (ASDRP) for providing us the opportunity to conduct artificial intelligence research this year.

Black male

Asian female

Black male

Indian female

  • Underrepresentation → bias against women and people of color
  • Higher differences in gender accuracy for non-white faces
  • Debiasing VAE (DB-VAE) → decreased gap in accuracies between subgroups but caused a ~10% drop in overall accuracy
  • Duplication → overfitting on minority images
  • Increased gap in accuracy between male and female classes
  • Balancing with VAEs → decreased gap in accuracies between male and female classes with minimal drop in overall accuracy
  • Balancing with GANs → largely similar, though performance increased for black and Indian females

Standard Deviation

Imbalanced Data

17.67

Oversampled Data

31.94

Data with VAE Faces

6.92

Data with GAN Faces

11.91

DB-VAE

11.28

  • Calculated standard deviation for the subgroup accuracies in each test
    • Lower value indicates less bias

Template ID: inquisitalanchor Size: 48x36