1 of 9

Large Language Models for

Data Management Tasks

Anna Fariha

University of Utah

Northwest Database Society (NWDS) Annual Meeting 2024

2 of 9

Questions to think about …

Can LLMs do cardinality estimation and query optimization?

Can LLMs help in database index tuning?

Can LLMs help with homogenizing data formats?

3 of 9

Which data-management tasks are well suited for LLM?

Should I use ChatGPT for cleaning up the addresses?

4 of 9

What factors of a task determines LLM’s suitability for it?

Uncertainty

5

Objectiveness of the task
Risk-level of the task
User trust

Code Requirement

4

Is the mechanism required?
Destructive side effect (deletion)

Domain Expertise

3

What denotes missing value?
What are valid values?
What outliers are expected?

System Context

2

Query workload
Database configuration
Hardware

Data Context

1

Schema
Data distribution
Data format

5 of 9

Interviews over 14 data scientists [Chopra et al. 2023]

6 of 9

Results of survey over 114 data scientists [Chopra et al. 2023]

7 of 9

Whether to ask for the mechanism or the result?

More control

Less control

Difficult to verify

Easy to verify

Reusable

Not reusable

8 of 9

Identify low-hanging fruits!

Data cleaning

Data organization and categorization

Data summarization

9 of 9

Thank you

Bhavya Chopra, Ananya Singha, Anna Fariha, Sumit Gulwani, Chris Parnin, Ashish Tiwari, Austin Z. Henley. Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities. CoRR abs/2310.16164 (2023)
Andrew M Mcnutt, Chenglong Wang, Robert A Deline, and Steven M. Drucker. 2023. On the Design of AI-Powered Code Assistants for Notebooks. CHI ’23.
Noah Hollmann, Samuel Müller, and Frank Hutter. 2023. LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering. arXiv:2305.03403
Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. Challenges and Applications of Large Language Models
Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts 2022 CHI.