1 of 33

AI for Data Analysis

Mary Ton

DH Librarian

Cadence Cordell

MLIS Maven

Jess Hagman

Social Sciences Research Librarian

2 of 33

Today’s slides

https://go.illinois.edu/

ai4data

The link is case sensitive.

3 of 33

AI is already part of our research process

Text-to-speech

Image recognition

4 of 33

Goals for today

  • Identify potential issues when using AI for data analysis
  • Discuss use cases for incorporating AI into your analysis workflow
  • Describe best practices for documenting your use of AI

5 of 33

General Considerations

6 of 33

Key Limitations

“An error” as imagined by Gemini

  • Modeling ≠ meaning
  • Bias
  • Data Privacy

Check out “A Gentle Introduction to ChatGPT” for more examples.

7 of 33

Modeling ≠ meaning

Plots

Villain

Plans

Diabolical

Morrow

corn

1876

rotation

1843

crop

experimental

The Morrow Plots

1843?

1876?

8 of 33

Limited Knowledge and Bias

*Note: ChatGPT assumed that the librarian was an unmarried woman in multiple iterations of the song, even though the prompt was gender neutral.

Write a sea shanty about a librarian and a cat

Oh, Miss Lily and Whiskers, a tale to be told,

In the library they wandered, their stories unfold.

With books in their hands and a twinkle in their eyes,

They sang songs of adventure 'neath the endless skies.

9 of 33

There’s bias in image-based AI too…

A librarian and a cat

A librarian

Even the cat is white…

10 of 33

Data Privacy

  • genAI tools can retain the information you give them to help improve the transformer.
  • Data retention policies are not always clearly articulated.
  • It can be a violation of ethics to share sensitive information with these tools.

11 of 33

Data Cleaning

12 of 33

Tools

ChatGPT

  • Free or $$
  • More accurate
  • More conversational

Copilot Enterprise

  • Free with netid
  • Data secure*
  • Internet connectivity

OpenRefine

  • Free
  • HIPAA Compliant
  • Not AI

13 of 33

Tasks

Sample Prompts

  • Missing Values
  • Inconsistencies
  • De-duplication
  • Creating sample data sets

14 of 33

Generating Code

15 of 33

Tools

Gemini

  • Free
  • Google suite access
  • Internet connectivity

Copilot Enterprise on Gitbub

  • Free with netid
  • Data secure*
  • Github integration

16 of 33

Code and copyright*

From the US Copyright Office’s Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence

“In the office’s view, it is well-established that copyright can protect only material that is the product of human creativity. Most fundamentally, the term ‘author,’ which is used in both the Constitution and the Copyright Act, excludes non-humans.”

Generally cannot claim copyright on text, images, and code that you generated with AI

May claim copyright on sufficiently creative prompts

May claim copyright on modifications that you make to generated text

*I am not a lawyer.

Generally cannot claim copyright on procedures

17 of 33

Sample prompts

  • Explain this code to me.
  • Pretend that you are a programmer. Write a regular expression in Python that matches email addresses.
  • I am trying to scrape a website for information about artists, performances, and locations. Which packages in Python would be most relevant?

18 of 33

Analyzing Your Data

19 of 33

AI in Major Qualitative Data Analysis Programs

MAXQDA

  • AI Assist (OpenAI) tool will summarize and paraphrase text or suggest sub-codes based on the data linked with a parent code.
  • Only available to users with AI Assist license

ATLAS.ti

  • Works with OpenAI to code data, suggest codes, summarize data. Includes “conversation” with the data and sentiment analysis
  • Available in all current versions of software

NVivo

  • Automatic coding features and sentiment analysis

20 of 33

Other options

  • CoLoop, reviewed here by Christina Silver
    • Automated transcription and summarization to produce an analysis grid
    • Asking questions of the data via CoLoop
  • ChatGPT, Copilot, etc
    • Upload data and use prompts to ask questions
    • See an exploration by David Morgan in the International Journal of Qualitative Methods
    • Using the prompts to brainstorm or write about your project
  • Qeludra: forthcoming tool developed by Susanne Friese
  • Authors on general qualitative data analysis strategies:

21 of 33

What can AI do for your qualitative analysis?

Interpretive qualitative data analysis relies on the researcher’s experiences and previous knowledge, as well as in-depth understanding of the data to develop rich and relevant analysis.

What is the AI doing? Description? Summarizing or paraphrasing? Coding?

Can the technology do that without the project-specific knowledge that you and your research team bring to the project.

What type of qualitative research are you doing?

22 of 33

Document Your Process

23 of 33

Ingredients of a good

AI disclosure statement

  • User
  • Tool
  • Date and time
  • Prompt
  • Generated text
  • Sections of your writing that contain AI-generated text

24 of 33

Document your decisions

Most qualitative research methods expect the researcher to be transparent and reflexive about their analysis process, making the documentation of how you have used AI even more important.

How does your own developing interpretation of the data rely on analysis you can get from an AI based tool.

25 of 33

Conclusions

26 of 33

Questions to consider

AI

Human

Automation Continuum

  • How sensitive is the data that I am working with?
  • Not sensitive data
  • Sensitive data

27 of 33

Questions to consider

AI

Human

Automation Continuum

  • How sensitive is the data that I am working with?
  • Am I looking for patterns or connections?
  • Pattern Recognition
  • Connections
  • Meaning

28 of 33

Questions to consider

AI

Human

Automation Continuum

  • How sensitive is the data that I am working with?
  • Am I looking for patterns or connections?
  • Which is more important– speed or depth?
  • Speed
  • Depth

29 of 33

Questions to consider

AI

Human

Automation Continuum

  • How sensitive is the data that I am working with?
  • Am I looking for patterns or connections?
  • Which is more important– speed or depth?
  • Is the goal of this data analysis project to listen deeply?
  • Exploratory
  • Non-policy related
  • Deep Listening
  • Policy

30 of 33

Resources

31 of 33

Library Resources

Level up your skills through the Savvy Researcher workshop series:

https://go.illinois.edu/sr

Check out previous workshops on AI, copyright, and text mining on the DH@Illinois Media Space Channel:

https://go.illinois.edu/dhchannel

NEW! Generative AI LibGuide:

https://guides.library.illinois.edu/generativeAI

32 of 33

Questions?

Jess Hagman (jhagman@illinois.edu)

  • Social Science Research Projects
  • Choosing and setting up technology for qualitative data analysis

Mary Ton (maryton@illinois.edu)

  • AI in the Humanities
  • Acquiring text mining data

Sara Benson (srbenson@illinois.edu)

  • Copyright questions

33 of 33

Additional Resources