1 of 38

Research Portfolio

Ishika Ray, PhD ABD

2 of 38

Table of Contents

About me

  • About me

  • Research approach

  • Methods

Arapahoe County IT Service Catalog

Incorporating user feedback into the consolidation of Arapahoe County IT Catalog

AI language models generate biased images

[ACADEMIC] Project that highlights bias in AI language models

Case 1

Case 2

3 of 38

About me

I am a behavioral researcher with a focus on understanding user perceptions through collaborative design and iterative empirical testing. I grew up in Kolkata, India and am currently based out of Seattle, WA.

I was trained at the University of Washington’s Psychology PhD program in experimental psychology and data science. I have extensive training in both qualitative and quantitative methods, In particular, I enjoy uncovering user needs through usability testing, user surveys, and in-depth interviews.

At Deloitte, I helped drive a user focus in an engineering-centric project. As a business analyst, I focused on advocating for user-friendly design changes that aligned with the functional enhancement needs of our stakeholders.

I am currently writing my dissertation on the pitfalls of collecting racial information without collecting qualitative data that offers researchers context for the respondents’ ethnocultural backgrounds.

A rare sighting of UW Psychology’s Shodalab

4 of 38

My research approach

My research approach is rooted in a sequential mixed-methods design.

I begin with qualitative inquiry to deeply understand the problem space. This informs the subsequent quantitative phase, where I measure and analyze key variables to deliver data-driven insights and recommendations.

I advocate for this iterative, mixed-methods approach throughout the project lifecycle to ensure quantifiable results are grounded in rich, contextual data.

5 of 38

Methods

Below are a few methods I use to collect and analyze data. I select the method based on problem we are trying to solve and the phase of product development.

Qualitative

Quantitative

Input

Interviews*

Moderated usability testing*

Unmoderated usability testing

Card-sorting

Benchmark usability testing

Survey*

First-click

Card-sorting*

Output

Thematic analysis*

Jobs-to-be-done*

Affinity diagramming*

Personas*

Usability metric comparison*

Statistical analysis (linear regression, mixed-effects models in R, SPSS, Excel)*

Heat-mapping

*indicates greatest strengths

6 of 38

Arapahoe County IT Service Catalog

Team: Ishika Ray (Product Management Fellow, UX Researcher), R. Caldwell (Supervisor, Business Analyst)

Project Duration: 10 weeks

Role: As a Coding it Forward Summer Fellow, I worked on an orphaned project to improve business processes within the Arapahoe IT Department. I was the sole contributor to this project and reported to my supervisor, R.Caldwell.

Skills Highlighted:

  • Moderated Interviews (Group and Individual)
  • Affinity Mapping
  • Data Cleaning (R/RStudio, Airtable)
  • Comparative Analysis

7 of 38

Arapahoe County and IT Department Overview

  • The Arapahoe County Information Technology (IT) Department is the strategic partner that develops, supports, and delivers technology services to other county departments and the public of Arapahoe County, Colorado.
  • Manages county's technology infrastructure, including data centers, networks, and applications.
  • Offers a wide range of services, including system administration, cybersecurity, and user support.
  • Annual expenditure: $61 million
  • Number of employees: 1000
  • Four divisions
  • Population served: 656,061

8 of 38

Case 1: IT Catalog Project & Objective

Note: this project is more confidential so fewer details can be shared.

Problem Statement: The IT department faced inefficiencies in managing their IT application catalogs across multiple sources, leading to delays and redundant processes. This caused confusion and frustration among employees who needed to access and utilize these applications. IT leadership wanted to implement an application rationalization plan but were faced with resistance from stakeholders within IT divisions.

Solution: To address these challenges, I created a centralized IT application catalog was developed using Airtable. This solution aimed to simplify SaaS license management, improve decision-making, and enhance overall efficiency within the IT department. To encourage stakeholder buy-in for application rationalization, I interviewed the stakeholders themselves. Their uncertainties were addressed directly in my presentation and memo addressed to IT leadership.

9 of 38

Case 1: Project Timeline

To complete this project within the 10-week fellowship, I pursued two goals:

  1. Data Science component: Consolidating the IT applications database before county-wide application rationalization efforts. In order to complete this goal, I (i) retrieved IT applications catalog artifacts from the 4 divisions within the IT department, (ii) researched the best ITSM software available to the county, and (iii) consolidated, deduplicated and organized the application catalog.

  • UX Research component: Recommending improved ITIL-compliant application rationalization plan that incorporated feedback from users. To get stakeholder buy-in and implement user-friendly changes to ITSM protocol, I interviewed 9 users across 4 IT divisions to document their user goals, pain points, and uncertainties surrounding application rationalization efforts.

10 of 38

IT Catalog Phase I: Defining User Priorities

The first phase was to define which tasks were most critical to our users. The two main objectives of this project were already outlined:

  1. Create a consolidated database of IT application purchases and licenses
    1. IT employees need a single data source of truth of county applications
    2. A database that can be edited by different division leaders
    3. Database needs to be housed on a platform that does not require extensive training to use�
  2. Identify data that the current ITSM software does not house
    • County employees often purchase licenses without first consulting the IT department. My goal was to minimize the occurrence of duplicate purchases
    • Minimize touch points between application requests and the approval/denial of application purchase

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

11 of 38

Phase II: Catalog Consolidation

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

Catalog consolidation was by far the most daunting data cleaning task of this project. To create a single list of applications, I sourced artifacts from multiple sources, including:

  1. The IT Management Software database maintained by IT application network managers
  2. Lists maintained by the IT Service Desk (i.e. list of apps that the IT department was equipped to provide support for)
  3. Files documenting ADA-compliant applications
  4. Business process maps which outlined the applications used within the process

I concatenated these files using R. All data cleaning (including deduplication of records) was completed using R Tidyverse. The final csv file contained a list of 953 applications, which was uploaded to Airtable.

12 of 38

Phase III: Comparative analysis

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

WHY AIRTABLE?

The next step of this project required me to conduct a comparative analysis of three products which were accessible to county workers in terms of (i) usability without training, (ii) cost and (iii) ITSM capabilities.

Of these constraints, usability was a top priority, since it was one of the main barriers for senior IT employees to migrate to a single, integrated ITSM platform.

Despite its lack of specialized cataloging features, I recommended migrating the initial database to Airtable. This is because its interface most closely resembles MS Excel (on which most of the original artifacts were housed). Airtable also makes it the easiest to customize tags to group applications by. The tagging system would make it easier for IT Service Desk, application managers, and business analysts to approve/reject new application/license requests.

*This step was conducted simultaneously with user interviews. Some of the decision-making related to this stage is based on data collected during the interviews.

13 of 38

Phase III: User Interviews

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

While the initial project scope did not include a dedicated UXR component, I determined that incorporating user feedback was essential to achieve 2 key outcomes: (i) securing stakeholder buy-in and (ii) designing a single sustainable database that could be maintained across IT divisions.

To achieve this, I interviewed 9 IT employees across the 4 IT divisions, including (i) individual contributors such as Service Desk representatives, server managers, and business analysts, and (ii) managers such as network managers and application managers.

I conducted 45-minute, semi-structured interviews over Zoom. Participants were asked the following questions:

  1. How do you locate/access the IT catalog?
  2. What do you like about the current IT catalog?
  3. What are some factors that discourage you from using the IT catalog?
  4. What are all the things you need to do and know in order to fill out the IT license/purchase application form as:
    1. A non-IT county worker?
    2. An IT employee?
  5. What works in the current application process (to purchase new IT licenses)?
  6. What does not work in the current application process?

14 of 38

Phase III: User Interview Results

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

I consolidated my findings in an affinity map using Miro. I uncovered five emergent themes which I then reframed as recommendations for IT leadership to inform a county-wide application rationalization effort*:

  1. Users tended to record info in static files, but wish to move to a more collaborative, cloud-based documentation process.
  2. Users wanted to minimize touch points in the process of applying to/purchasing a new IT application license.
  3. In order to track unapproved purchases by non-IT employees, users identified the need to conduct periodic county-wide IT application audits.
  4. Users noted that the IT Department’s internecine communication was inefficient, and preferred that communication protocol be integrated into the business process map.
  5. Users reiterated the need for a single source-of-truth database that can be accessed and modified by managers across all IT divisions, in order to create a consolidated application portfolio.

*Application rationalization is a strategic process undertaken by organizations to optimize their application portfolio and improve operational efficiency.

15 of 38

Phase III: User Interview Recommendations I

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

Based on the emergent themes from user interviews, I made the following recommendations:

  • Standardize application description:
    • Users found application descriptions to be inconsistent. I recommended that product owners, server managers, and IT Service Desk managers must collectively approve of an application description before it is shared with non-IT department users.
  • Tag applications by function:
    • Creating Airtable application tags can help (i) standardize application descriptions across IT divisions AND county departments, and (ii) identify applications that serve duplicate functions.

16 of 38

Phase III: User Interview Recommendations II

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

Based on the emergent themes from user interviews, I made the following recommendations:

  • Consolidate stakeholder roles and processes:
    • When describing the evaluation process for an IT application
    • IT stakeholders reported differing documentation and practices for the same business processes. Further, business processes relied too heavily on tacit user knowledge. I recommended the creation of a single business process map that is approved by all stakeholders.
  • Create a unified IT application catalog:
    • I recommended the use of the Airtable IT application database that all county workers can view. This will mitigate the likelihood of duplicate application purchase/license requests will be made.
    • I recommended that all IT division managers have the permission to modify this database. This will: (i) create a single source of documentation, and (ii) incorporate inter-divisional communication between IT stakeholders.

17 of 38

Phase III: Overall Recommendations

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

Periodic Audit of Non-IT County Departments:

IT employees would benefit from audits of other departments to account for duplicate and/or unauthorized licenses

Multi Stakeholder Catalog Maintenance:

IT catalog data should be updated concurrently by stakeholders who are directly involved in business processing of acquiring/removing services from the county’s IT portfolio

Application Deduplication in Rationalization Exercise:

Currently, the department has an equal number of employees and service licenses!

18 of 38

Phase IV: Deliverables

Define User Priorities

Catalog Consolidation (Data science component)

User Interviews and IT Catalog Research

Deliverables

The primary goals of this project were to create a consolidated IT service database, and increase stakeholder buy-in. My deliverables included:

  1. The final consolidated Airtable database of all 953 applications.
  2. A stakeholder presentation to introduce the county’s application rationalization plan – you can access the slides here.
  3. An internal memo reporting our detailed findings – this memo is protected by an NDA, but its findings are reiterated in Arapahoe County’s strategic plan.

19 of 38

Project Impact Summary

  • Reduced IT catalog artifacts by 94% (16 artifacts condensed to one database)
  • Addressed stakeholder uncertainties and increased buy-in from 38% to 85% (surveyed post-project by supervisor)
  • Simplified SaaS license management.
  • Enhanced overall efficiency by within the IT department by reducing touchpoints from 5 stakeholders to one centralized database

20 of 38

Research Wishlist

In the absence of resource constraints, here is what I would do differently:

  • Interview managers from non-IT county departments to:
    • create a representative set of user personas
    • map user perspectives and pain points of applying for application licenses sent to the IT department
  • Recommend the use of a comprehensive IT Management Software such as Workday
  • Create a comprehensive application rationalization protocol

21 of 38

AI language models generate biased images

Team: Ishika Ray (Principal Researcher)

Project Duration: ~2 months (cumulative)

Role: I worked on this package of studies as a part of my doctoral dissertation. I was the sole researcher on this project and completed all research tasks from end-to-end.

Although this project began as an academic venture, I soon realized that it can be used to create more ethical models of AI, especially in the realm of image generation. This is why I have included this project in my research portfolio.

Skills Highlighted:

  • Card sorting (Qualtrics)
  • A/B testing

22 of 38

Project Overview

Objective: To investigate and address biases in AI-generated images of “Asians”.

Key Concerns: Training data biases, algorithmic reinforcement, societal stereotypes, and underrepresentation.

Research Findings: AI models tend to favor “prototypical” demographic groups when generating images, reinforcing societal biases.

Implications: Biased AI outputs can [further] perpetuate discrimination and misinformation in visual media, demography, and social scientific research.

Proposed Solutions: Enhancing dataset diversity, implementing bias audits, and increasing transparency in AI development.

Future Research: Further exploration of diverse training datasets and user-driven AI customization options.

23 of 38

Case 2: Why Should We Care?

The use of generative AI a broad audience is on the rise, making it important to train AI models to minimize harmful outcomes (Hendryks et al., 2024), such as:

  • The perpetuation of biases and stereotypes
  • The dissemination of misinformation
  • Malicious use, such as the weaponization of pathogens or covert manipulation of users

This project focuses specifically on the first danger, viz. the perpetuation of biases and stereotypes by AI language models.

Just as this AI-generated image does not have any intelligible words, AI images of people are also susceptible to errors

Note: this project is less confidential so real artifacts are provided.

Hendrycks, D., Mazeika, M., & Woodside, T. (2024). Overview of catastrophic AI risks. Introduction to AI Safety, Ethics, and Society, 3-50. https://doi.org/10.1201/9781003530336-1

24 of 38

Case 2: Why Should We Care?

The generated images often reflect biases from training data and societal norms. For instance:

  • Underrepresentation of certain demographics in AI training datasets can lead to skewed AI output (Zhao et al., 2017; Raji et al., 2020)
  • Algorithmic biases can reinforce stereotypes, especially in professional and cultural settings (Bolukbasi et al., 2016; West et al., 2019)
  • The lack of diversity in AI development teams can contribute to biased outcomes (Criado-Perez, 2019; Yang et al., 2020)

Note: this project is less confidential so real artifacts are provided.

I generated this image of “a doctor and a nurse” using ChatGPT. Much like the 20 other iterations, the language model generated an image of a white male doctor and a white female nurse (Birhane et al., 2021).

Birhane, A., et al. (2021). The Role of Bias in AI Image Generation. AI & Society Journal.

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Advances in Neural Information Processing Systems.

Criado-Perez, C. (2019). Invisible Women: Data Bias in a World Designed for Men. Abrams Press.

Raji, I. D., et al. (2020). Saving Face: Investigating the Ethical Concerns of Facial Recognition Datasets. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

West, S. M., et al. (2019). Discriminating Systems: Gender, Race, and Power in AI. AI Now Institute.

Yang, S., et al. (2020). The Impact of Training Data on AI Fairness. Journal of AI Research.

Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2017). Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-Level Constraints. Empirical Methods in Natural Language Processing

25 of 38

Case 2: Research Question

When I began working on this project, I was mainly interested in the research question: How do people interpret the word “Asian”?

In my own experience, my identity as an Asian has often been questioned in the United States. Americans are less likely to consider Indians (amongst others) to be Asians, and people typically default to the idea the East Asian = Asian (Lee & Ramakrishnan, 2019; Goh & McCue, 2021).

These findings make this project uniquely personal to me, but serves to benefit a much, much larger population.

Goh, J. X., & McCue, J. (2021). Perceived prototypicality of Asian subgroups in the United States and the United Kingdom. Journal of Experimental Social Psychology, 97, 104201.

Lee, J., & Ramakrishnan, K. (2020). Who counts as Asian. Ethnic and Racial Studies, 43(10), 1733-1756.

A(n AI-generated) collage of quotes that I have encountered when I have identified as an “Asian.” The typos are AI-generated as well, despite my error-free prompts.

26 of 38

Case 2: Hypothesis

Initially, I ran 3 tasks to test the hypothesis that:

Participants who are asked to classify portraits of people into conventional racial groups (i.e. Asian, Black, White, and the additional ethnic group of Latino) are more likely to categorize East Asian faces as Asian, in comparison to Southeast Asian and South Asian faces.

Conversely, participants who are given additional information about the social category of ‘Asian’ are more likely to categorize faces of Southeast- and South Asian-appearing faces as Asian.

I initially conducted these studies on human participants, and then tried to replicate my findings on AI models.

27 of 38

Case 2: Methods for human participants

I created a closed card sorting task, where participants were asked to categorize photos of 96 men (sourced from the Chicago Face Database/CFD) into various categories.

Task 1: 100 Participants were asked to sort faces into one of four “conventional” categories of race. This task was a conceptual replication of the methods used by CFD researchers to create their standardized database.

Task 2: 97 Participants were asked to sort faces into one of two categories: Asian, or non-Asian.

Task 3: 85 Participants were asked to sort faces as either: East Asian, Southeast Asian, South Asian, Central Asian, other Asian, or non-Asian. Additionally, participants were primed with information on the national origins of different populations who may identify as Asian. For each category, this information was added in parentheses; e.g. Central Asians (people who trace their origins to Kyrgyzstan, Tajikistan, etc.)

Screengrab of Task 3

28 of 38

Case 2: Results

Our results showed some clear trends:

  1. Participants in Task 1 mimicked the sorting behavior of the original participants recruited for the CFD photo categorization project.
  2. Photos of self-identified Asians were much less likely to be sorted into the Asian category if they did not appear prototypically East Asian (with fair skin, straight hair, and distinctively East Asian facial features). This was the case in both Task 1 and Task 2.
  3. With Task 3’s additional information regarding the vastly diverse range of groups who might identify as Asian, participants were more likely to sort self-identified Asian photos as some Asian subgroup (instead of sorting them into the non-Asian category).

29 of 38

Case 2: What does this mean?

Running these tasks demonstrated 3 key points:

  1. Participants recruited in the USA are likely to interpret Asian to mean East Asian, i.e. people who trace their origins to China, Japan, Korea, or Taiwan.
  2. Nudging people to think about the diversity amongst Asian populations can increase the proportion of people who consider South- or Southeast Asian people to be Asian – in some cases doubling the proportion!
  3. Describing individuals as ‘Asian’ could lead to serious misinterpretations if the ethnicity of these individuals are not specified. For instance, research findings from a study with Thai Asians may be perceived to be applicable – even based on – Chinese Asians. While some industry standards require researchers to be more specific in describing participants, there are still dangerous implications of using a blanket term ‘Asian’.

30 of 38

Case 2: Replicating Findings – Generating images with AI

Academic research is often perceived to be too theoretical to be applied to industry research. To demonstrate its effects on real-world products, I picked 2 AI image generators to test my research question:

How do AI image generators respond to reminders of Asian diversity?

To demonstrate the amplification of the “East Asian = Asian” effect in AI language models, I compared ChatGPT’s Dall-E, and Gemini’s Imagen 3 to generate images of an Asian individual.

I treated this as a usability test to assess whether I (the user) could prompt an AI model to generate a non-prototypical image of an Asian person.

Both models prompted me to provide additional information about the individual whose image was to be generated. In the absence of further context, however, both produced the image of an East Asian man, in keeping with the colloquial perception of Asians and Asianness.

31 of 38

Case 2: Replicating Findings – Generating AI Images with Diversity Reminders

To test whether AI language models can produce different results (i.e. non-East Asian faces) based on additional instructions, I used the same prompts from my card sorting Task 3 and compared the results. I made sure to clear all preferences and memories from these models to minimize personalized outputs. Unsurprisingly, both models produced East Asian-appearing results.

Output from ChatGPT using Dall-E

Output from Gemini using Imagen 3

32 of 38

Case 2: Replicating Findings from ChatGPT

Next, I re-ran these prompts as part of the same conversation. That is, once the AI model had:

(i) generated the image of an ‘Asian person’, I prompted it to:

(ii) generate another image of an Asian person while keeping in mind the breadth of Asian diversity.

ChatGPT attempted to improve on its previous images by presenting images of Asian women in seemingly Central Asian attire. When asked about its logical reasoning, ChatGPT/Dall-E offered a number factors it had taken into consideration, including my reminder of Asian diversity.

33 of 38

Case 2: Replicating Findings from Gemini

In the case of Gemini, it appeared that the reminder of Asian diversity had less of an effect on Imagen 3’s decision to portray a stereotypically East Asian individual. When I asked the model to explain its logic, it could not provide any.

Despite this, when I asked the same question the Gemini 2.0 Flash Thinking Experimental model, it provided a step-by-step sequence of reasoning that did not entirely match the output.

34 of 38

Case 2: Overview of Findings

  • Both ChatGPT and Gemini produced images of East Asian men when prompted to generate an image of an "Asian person". However, ChatGPT asked the user to provide additional descriptors, in whose absence it reverted to generating a prototypical East Asian portrait.
  • ChatGPT produced images of seemingly Central Asian women when reminded of Asian diversity. In contrast, Gemini's image output was not affected by the reminder.
  • ChatGPT’s logical reasoning accounted for its choice to generate an Asian portrait of a person who – based on the model’s logic – is relatively less prototypically East Asian. Gemini 2.0 Flash Thinking Experimental model provided a step-by-step sequence of reasoning that did not entirely match the output.

35 of 38

Case 2: Again, Why Should We Care?

It was apparent that both Dall-E and Imagen 3 were resistant to creating non-prototypical Asian images. This supports conclusions drawn by previous research, which attributes the amplification of societal biases due to factors such as (i) a lack of diversity in training data sets, and (ii) a lack of diversity in the teams that develop AI models.

Two ways to combat these biases include: (i) oversampling of underrepresented demographics in order to account for the lack of non-prototypical data points, and (ii) the recruitment of diverse AI development teams. This includes the recruitment of people from diverse ethnoracial backgrounds, as well as people from social science backgrounds who can offer recommendations to nudge AI models to be more inclusive.

It may be argued that AI models will self-correct their biases over time, when learning from human users. However, this course correction may take years to take effect.

36 of 38

Applications in Industry

  • Improving AI Image Generation Models: These findings can be used as a springboard to advocate for training AI models to generate more diverse and accurate images based on user feedback, reducing biases and stereotypes.
  • Developing Ethical AI Guidelines: Since we are currently experiencing the beginning of an AI boom, research on inclusive AI models can inform the early development of guidelines and best practices for creating and deploying AI image generation models.
  • Enhancing User Experience: Culturally-sensitive AI models can provide a more inclusive and seamless experience for users. Without user testing AI models on a global population, AI models could inadvertently exclude markets of paying users.

37 of 38

Project Impact Summary

  • Users are likely to perpetuate the stereotype of “East Asian = Asian.” This is reflected in the behavior of AI models.
  • Reminders of diversity can increase the “accurate” labeling of images (of humans) by 50%, but this finding could not be reproduced by AI models.
  • Findings support previous research on the amplification of societal biases.

38 of 38

Questions?