Research Portfolio
Ishika Ray, PhD ABD
Table of Contents
About me
Arapahoe County IT Service Catalog
Incorporating user feedback into the consolidation of Arapahoe County IT Catalog
AI language models generate biased images
[ACADEMIC] Project that highlights bias in AI language models
Case 1
Case 2
About me
I am a behavioral researcher with a focus on understanding user perceptions through collaborative design and iterative empirical testing. I grew up in Kolkata, India and am currently based out of Seattle, WA.
I was trained at the University of Washington’s Psychology PhD program in experimental psychology and data science. I have extensive training in both qualitative and quantitative methods, In particular, I enjoy uncovering user needs through usability testing, user surveys, and in-depth interviews.
At Deloitte, I helped drive a user focus in an engineering-centric project. As a business analyst, I focused on advocating for user-friendly design changes that aligned with the functional enhancement needs of our stakeholders.
I am currently writing my dissertation on the pitfalls of collecting racial information without collecting qualitative data that offers researchers context for the respondents’ ethnocultural backgrounds.
A rare sighting of UW Psychology’s Shodalab
My research approach
My research approach is rooted in a sequential mixed-methods design.
I begin with qualitative inquiry to deeply understand the problem space. This informs the subsequent quantitative phase, where I measure and analyze key variables to deliver data-driven insights and recommendations.
I advocate for this iterative, mixed-methods approach throughout the project lifecycle to ensure quantifiable results are grounded in rich, contextual data.
Methods
Below are a few methods I use to collect and analyze data. I select the method based on problem we are trying to solve and the phase of product development.
| Qualitative | Quantitative |
Input | Interviews* Moderated usability testing* Unmoderated usability testing Card-sorting | Benchmark usability testing Survey* First-click Card-sorting* |
Output | Thematic analysis* Jobs-to-be-done* Affinity diagramming* Personas* | Usability metric comparison* Statistical analysis (linear regression, mixed-effects models in R, SPSS, Excel)* Heat-mapping |
*indicates greatest strengths
Arapahoe County IT Service Catalog
Team: Ishika Ray (Product Management Fellow, UX Researcher), R. Caldwell (Supervisor, Business Analyst)
Project Duration: 10 weeks
Role: As a Coding it Forward Summer Fellow, I worked on an orphaned project to improve business processes within the Arapahoe IT Department. I was the sole contributor to this project and reported to my supervisor, R.Caldwell.
Skills Highlighted:
Arapahoe County and IT Department Overview
Case 1: IT Catalog Project & Objective
Note: this project is more confidential so fewer details can be shared.
Problem Statement: The IT department faced inefficiencies in managing their IT application catalogs across multiple sources, leading to delays and redundant processes. This caused confusion and frustration among employees who needed to access and utilize these applications. IT leadership wanted to implement an application rationalization plan but were faced with resistance from stakeholders within IT divisions.
Solution: To address these challenges, I created a centralized IT application catalog was developed using Airtable. This solution aimed to simplify SaaS license management, improve decision-making, and enhance overall efficiency within the IT department. To encourage stakeholder buy-in for application rationalization, I interviewed the stakeholders themselves. Their uncertainties were addressed directly in my presentation and memo addressed to IT leadership.
Case 1: Project Timeline
To complete this project within the 10-week fellowship, I pursued two goals:
IT Catalog Phase I: Defining User Priorities
The first phase was to define which tasks were most critical to our users. The two main objectives of this project were already outlined:
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
Phase II: Catalog Consolidation
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
Catalog consolidation was by far the most daunting data cleaning task of this project. To create a single list of applications, I sourced artifacts from multiple sources, including:
I concatenated these files using R. All data cleaning (including deduplication of records) was completed using R Tidyverse. The final csv file contained a list of 953 applications, which was uploaded to Airtable.
Phase III: Comparative analysis
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
WHY AIRTABLE?
The next step of this project required me to conduct a comparative analysis of three products which were accessible to county workers in terms of (i) usability without training, (ii) cost and (iii) ITSM capabilities.
Of these constraints, usability was a top priority, since it was one of the main barriers for senior IT employees to migrate to a single, integrated ITSM platform.
Despite its lack of specialized cataloging features, I recommended migrating the initial database to Airtable. This is because its interface most closely resembles MS Excel (on which most of the original artifacts were housed). Airtable also makes it the easiest to customize tags to group applications by. The tagging system would make it easier for IT Service Desk, application managers, and business analysts to approve/reject new application/license requests.
*This step was conducted simultaneously with user interviews. Some of the decision-making related to this stage is based on data collected during the interviews.
Phase III: User Interviews
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
While the initial project scope did not include a dedicated UXR component, I determined that incorporating user feedback was essential to achieve 2 key outcomes: (i) securing stakeholder buy-in and (ii) designing a single sustainable database that could be maintained across IT divisions.
To achieve this, I interviewed 9 IT employees across the 4 IT divisions, including (i) individual contributors such as Service Desk representatives, server managers, and business analysts, and (ii) managers such as network managers and application managers.
I conducted 45-minute, semi-structured interviews over Zoom. Participants were asked the following questions:
Phase III: User Interview Results
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
I consolidated my findings in an affinity map using Miro. I uncovered five emergent themes which I then reframed as recommendations for IT leadership to inform a county-wide application rationalization effort*:
*Application rationalization is a strategic process undertaken by organizations to optimize their application portfolio and improve operational efficiency.
Phase III: User Interview Recommendations I
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
Based on the emergent themes from user interviews, I made the following recommendations:
Phase III: User Interview Recommendations II
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
Based on the emergent themes from user interviews, I made the following recommendations:
Phase III: Overall Recommendations
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
Periodic Audit of Non-IT County Departments:
IT employees would benefit from audits of other departments to account for duplicate and/or unauthorized licenses
Multi Stakeholder Catalog Maintenance:
IT catalog data should be updated concurrently by stakeholders who are directly involved in business processing of acquiring/removing services from the county’s IT portfolio
Application Deduplication in Rationalization Exercise:
Currently, the department has an equal number of employees and service licenses!
Phase IV: Deliverables
Define User Priorities
Catalog Consolidation (Data science component)
User Interviews and IT Catalog Research
Deliverables
The primary goals of this project were to create a consolidated IT service database, and increase stakeholder buy-in. My deliverables included:
Project Impact Summary
Research Wishlist
In the absence of resource constraints, here is what I would do differently:
AI language models generate biased images
Team: Ishika Ray (Principal Researcher)
Project Duration: ~2 months (cumulative)
Role: I worked on this package of studies as a part of my doctoral dissertation. I was the sole researcher on this project and completed all research tasks from end-to-end.
Although this project began as an academic venture, I soon realized that it can be used to create more ethical models of AI, especially in the realm of image generation. This is why I have included this project in my research portfolio.
Skills Highlighted:
Project Overview
Objective: To investigate and address biases in AI-generated images of “Asians”.
Key Concerns: Training data biases, algorithmic reinforcement, societal stereotypes, and underrepresentation.
Research Findings: AI models tend to favor “prototypical” demographic groups when generating images, reinforcing societal biases.
Implications: Biased AI outputs can [further] perpetuate discrimination and misinformation in visual media, demography, and social scientific research.
Proposed Solutions: Enhancing dataset diversity, implementing bias audits, and increasing transparency in AI development.
Future Research: Further exploration of diverse training datasets and user-driven AI customization options.
Case 2: Why Should We Care?
The use of generative AI a broad audience is on the rise, making it important to train AI models to minimize harmful outcomes (Hendryks et al., 2024), such as:
This project focuses specifically on the first danger, viz. the perpetuation of biases and stereotypes by AI language models.
Just as this AI-generated image does not have any intelligible words, AI images of people are also susceptible to errors
Note: this project is less confidential so real artifacts are provided.
Hendrycks, D., Mazeika, M., & Woodside, T. (2024). Overview of catastrophic AI risks. Introduction to AI Safety, Ethics, and Society, 3-50. https://doi.org/10.1201/9781003530336-1
Case 2: Why Should We Care?
The generated images often reflect biases from training data and societal norms. For instance:
Note: this project is less confidential so real artifacts are provided.
I generated this image of “a doctor and a nurse” using ChatGPT. Much like the 20 other iterations, the language model generated an image of a white male doctor and a white female nurse (Birhane et al., 2021).
Birhane, A., et al. (2021). The Role of Bias in AI Image Generation. AI & Society Journal.
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Advances in Neural Information Processing Systems.
Criado-Perez, C. (2019). Invisible Women: Data Bias in a World Designed for Men. Abrams Press.
Raji, I. D., et al. (2020). Saving Face: Investigating the Ethical Concerns of Facial Recognition Datasets. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
West, S. M., et al. (2019). Discriminating Systems: Gender, Race, and Power in AI. AI Now Institute.
Yang, S., et al. (2020). The Impact of Training Data on AI Fairness. Journal of AI Research.
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2017). Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-Level Constraints. Empirical Methods in Natural Language Processing
Case 2: Research Question
When I began working on this project, I was mainly interested in the research question: How do people interpret the word “Asian”?
In my own experience, my identity as an Asian has often been questioned in the United States. Americans are less likely to consider Indians (amongst others) to be Asians, and people typically default to the idea the East Asian = Asian (Lee & Ramakrishnan, 2019; Goh & McCue, 2021).
These findings make this project uniquely personal to me, but serves to benefit a much, much larger population.
Goh, J. X., & McCue, J. (2021). Perceived prototypicality of Asian subgroups in the United States and the United Kingdom. Journal of Experimental Social Psychology, 97, 104201.
Lee, J., & Ramakrishnan, K. (2020). Who counts as Asian. Ethnic and Racial Studies, 43(10), 1733-1756.
A(n AI-generated) collage of quotes that I have encountered when I have identified as an “Asian.” The typos are AI-generated as well, despite my error-free prompts.
Case 2: Hypothesis
Initially, I ran 3 tasks to test the hypothesis that:
Participants who are asked to classify portraits of people into conventional racial groups (i.e. Asian, Black, White, and the additional ethnic group of Latino) are more likely to categorize East Asian faces as Asian, in comparison to Southeast Asian and South Asian faces.
Conversely, participants who are given additional information about the social category of ‘Asian’ are more likely to categorize faces of Southeast- and South Asian-appearing faces as Asian.
I initially conducted these studies on human participants, and then tried to replicate my findings on AI models.
Case 2: Methods for human participants
I created a closed card sorting task, where participants were asked to categorize photos of 96 men (sourced from the Chicago Face Database/CFD) into various categories.
Task 1: 100 Participants were asked to sort faces into one of four “conventional” categories of race. This task was a conceptual replication of the methods used by CFD researchers to create their standardized database.
Task 2: 97 Participants were asked to sort faces into one of two categories: Asian, or non-Asian.
Task 3: 85 Participants were asked to sort faces as either: East Asian, Southeast Asian, South Asian, Central Asian, other Asian, or non-Asian. Additionally, participants were primed with information on the national origins of different populations who may identify as Asian. For each category, this information was added in parentheses; e.g. Central Asians (people who trace their origins to Kyrgyzstan, Tajikistan, etc.)
Screengrab of Task 3
Case 2: Results
Our results showed some clear trends:
Case 2: What does this mean?
Running these tasks demonstrated 3 key points:
Case 2: Replicating Findings – Generating images with AI
Academic research is often perceived to be too theoretical to be applied to industry research. To demonstrate its effects on real-world products, I picked 2 AI image generators to test my research question:
How do AI image generators respond to reminders of Asian diversity?
To demonstrate the amplification of the “East Asian = Asian” effect in AI language models, I compared ChatGPT’s Dall-E, and Gemini’s Imagen 3 to generate images of an Asian individual.
I treated this as a usability test to assess whether I (the user) could prompt an AI model to generate a non-prototypical image of an Asian person.
Both models prompted me to provide additional information about the individual whose image was to be generated. In the absence of further context, however, both produced the image of an East Asian man, in keeping with the colloquial perception of Asians and Asianness.
Case 2: Replicating Findings – Generating AI Images with Diversity Reminders
To test whether AI language models can produce different results (i.e. non-East Asian faces) based on additional instructions, I used the same prompts from my card sorting Task 3 and compared the results. I made sure to clear all preferences and memories from these models to minimize personalized outputs. Unsurprisingly, both models produced East Asian-appearing results.
Output from ChatGPT using Dall-E
Output from Gemini using Imagen 3
Case 2: Replicating Findings from ChatGPT
Next, I re-ran these prompts as part of the same conversation. That is, once the AI model had:
(i) generated the image of an ‘Asian person’, I prompted it to:
(ii) generate another image of an Asian person while keeping in mind the breadth of Asian diversity.
ChatGPT attempted to improve on its previous images by presenting images of Asian women in seemingly Central Asian attire. When asked about its logical reasoning, ChatGPT/Dall-E offered a number factors it had taken into consideration, including my reminder of Asian diversity.
Case 2: Replicating Findings from Gemini
In the case of Gemini, it appeared that the reminder of Asian diversity had less of an effect on Imagen 3’s decision to portray a stereotypically East Asian individual. When I asked the model to explain its logic, it could not provide any.
Despite this, when I asked the same question the Gemini 2.0 Flash Thinking Experimental model, it provided a step-by-step sequence of reasoning that did not entirely match the output.
Case 2: Overview of Findings
Case 2: Again, Why Should We Care?
It was apparent that both Dall-E and Imagen 3 were resistant to creating non-prototypical Asian images. This supports conclusions drawn by previous research, which attributes the amplification of societal biases due to factors such as (i) a lack of diversity in training data sets, and (ii) a lack of diversity in the teams that develop AI models.
Two ways to combat these biases include: (i) oversampling of underrepresented demographics in order to account for the lack of non-prototypical data points, and (ii) the recruitment of diverse AI development teams. This includes the recruitment of people from diverse ethnoracial backgrounds, as well as people from social science backgrounds who can offer recommendations to nudge AI models to be more inclusive.
It may be argued that AI models will self-correct their biases over time, when learning from human users. However, this course correction may take years to take effect.
Applications in Industry
Project Impact Summary
Questions?
Email: ishikaray8@gmail.com