Collaborative and private AI model training to combat algorithmic bias
Val & Yuriko, Community Privacy Residency 2025
What is federated learning or client-side model training?
In traditional machine learning, all data must be centralized in one database before training a model.
Federated learning (often referred to as collaborative learning) is a decentralized approach to training machine learning models that allows collaborative training of models across multiple data owners without sharing their raw datasets. To enhance privacy in federated learning, multi-party computation can be leveraged for secure communication and computation during model training.
What makes a good use case for federated learning?
Existing use cases for federated learning
Going to bed after a fun day hanging out with all my new friends from the Community Privacy Residency
Why can’t I stop thinking about collaborative & private model training for mitigating algorithmic bias?!?!!?!?!
A primary cause of algorithmic bias is a lack of diversity in training data
“Because the algorithm used the results of its own predictions to improve its accuracy, it got stuck in a pattern of sexism against female candidates.”
Types of Bias in Algorithms
👎🏻 Sexism: Bias against women and marginalized genders
👎🏼 Racism: Bias or discrimination against racial minorities or BIPOC folks
👎🏽 Homophobia & Transphobia: Bias against LGBTQIA+ community
👎🏾 Ableism: Bias against folks with disabilities
👎🏿 Fatphobia: Bias against fat people
👎🏻 Whorephobia: Bias against whores / sex workers
👎🏼 Language Bias: Bias against people who speak with non-dominant accents, dialects, or languages
Why is mitigating algorithmic bias a good use case for federated learning?
Clear incentives to collaborate on both sides with a need to ensure privacy of data
By working together, organizations can bring more diverse datasets from various sources together, which helps ensure that the model learns from a wide range of experiences and perspectives. This diversity is crucial because bias often arises when a model is trained on a dataset that is too homogeneous or unrepresentative of the wider population.
When sensitive data is used to train models, individuals may be hesitant to participate due to privacy concerns. This can lead to an underrepresentation of certain groups. With federated learning, users can contribute data without exposing it, which encourages wider participation.
Regulatory Pressure i.e. EU AI Act
The EU AI Act, passed in 2024, is the world’s most comprehensive AI regulation so far. It requires mandatory bias and risk assessments for AI systems in high-risk areas like hiring, healthcare, policing, and banking.
Companies must demonstrate that their training data is representative and fair.
Non-compliance can result in fines up to 6% of global revenue.
This is the first time that legal liability for algorithmic bias is being directly tied to corporate profits.
Non-profit and Business (For-profit)
Example: Hiring algorithm
Two potential collaborating entities:
Datasets from the local community-based job placement center can be used to help correct Indeed’s model so that it stops discriminating against job-seekers from under-represented communities. But, these sensitive datasets need to be kept private from the other entity.
Non-profit and Business (For-profit)
Example: Content moderation & Detection algorithms
Two potential collaborating entities:
Datasets from Lips can be used to help correct Instagram’s model so that it stops discriminating against queer people and sex workers. But, the Lips and Instagram datasets need to be kept private each other. Especially in the case of sex workers and protecting their privacy!
Alternative Data Governance Approaches
Areas for further research
Interviews!
✅ Joseph Lacey (SysOps for nonprofits & vulnerable communities)
✅ Rohini (Researcher & Technologist, Non-consensual image abuse expert)
✅ Elo (Applied cryptography for private credentials)
✅ Rudy Fraser (Developer of BlackSky Algorithms - Black creator feeds for Bluesky)
✅ Annie Brown (Founder of Reliabl.ai, Algorithmic bias researcher)
✅ Duncan McElfresh (Machine learning engineer)
✅ io (Cybersecurity expert for vulnerable communities)
✅ Josh Tan (Public AI advocate)
✅ Luke Miller (Community governed AI Developer)
✅ Josue Guillen (Texas Organizing Project, The Movement Co-operative)
💚 Sonam Jindal (Partnerships in AI, met at RightsCon)
💚 Dr. Carolina Are (Blogger on a Pole, Algorithmic bias researcher & sex worker)
The more I learn, the more questions I have :)
Therefore, this presentation is a smattering of random but relevant rabbit holes I went down this week that get me both closer to and farther from an understanding of potential use cases of private and collaborative AI model training.
Let’s go!
Existing Federated Learning Use Cases
Private & Federated Learning in Healthcare
Examples of the algorithms where these biases are commonly found:
“To be included or not be included, THAT is the question…”
Government AI Procurement
“When government AI systems base their determinations on biased data, their outputs can perpetuate harmful biases and strip marginalized beneficiaries of the government benefits they deserve.” - Outsourced and Automated
The report notes that governments are increasingly adopting AI systems due to the increased pressure to meet demand for public and social services (as a result of austerity)...
Further research into laws that prohibit discrimination where AI is common:
Proxy Discrimination and the Limits of Legal Anti-Discrimination Law
“The continued evolution of AI and big data will cause proxy discrimination to increase substantially whenever anti-discrimination law seeks to prohibit the use of characteristics that are directly predictive of risk…
For these reasons, anti-discrimination laws that prohibit discrimination based on directly predictive characteristics must adapt to combat proxy discrimination in the age of AI and big data.”
Key Takeaways
Model or No Model?
👍🏻 LLMs (ideally public AI)
👍🏼 Content moderation (automatic image detection for extreme content no human should have to review/be exposed to)
👍🏽 Recommendation models for information or products
👍🏾 Healthcare: disease prediction
👍🏿 Decentralized personal AI assistants (digital twins)
*In these examples, we may want to use AI and therefore be invested in making these systems more inclusive, fair, and accurate.
👎🏻 Social services determinations
👎🏾 Facial or body recognition systems for surveillance
👎🏿 Income, employment, or credit verification
*For these examples, we might think about how relying exclusively on manual processes also enable bias and perhaps solutions require some combination of manual review and fairness-ensuring automation.
Emerging Legal Frameworks
Emerging Community Data Governance Protocols & Practices
Areas for further research
Thoughts? Questions? Concerns?
Appendix