1 of 9

Large Language Models for

Data Management Tasks

Anna Fariha

University of Utah

Northwest Database Society (NWDS) Annual Meeting 2024

2 of 9

Questions to think about …

Can LLMs do cardinality estimation and query optimization?

Can LLMs help in database index tuning?

Can LLMs help with homogenizing data formats?

3 of 9

Which data-management tasks are well suited for LLM?

Should I use ChatGPT for cleaning up the addresses?

4 of 9

What factors of a task determines LLM’s suitability for it?

Uncertainty

5

  • Objectiveness of the task
  • Risk-level of the task
  • User trust

Code Requirement

4

  • Is the mechanism required?
  • Destructive side effect (deletion)

Domain Expertise

3

  • What denotes missing value?
  • What are valid values?
  • What outliers are expected?

System Context

2

  • Query workload
  • Database configuration
  • Hardware

Data Context

1

  • Schema
  • Data distribution
  • Data format

5 of 9

Interviews over 14 data scientists [Chopra et al. 2023]

6 of 9

Results of survey over 114 data scientists [Chopra et al. 2023]

7 of 9

Whether to ask for the mechanism or the result?

More control

Less control

Difficult to verify

Easy to verify

Reusable

Not reusable

8 of 9

Identify low-hanging fruits!

Data cleaning

Data organization and categorization

Data summarization

9 of 9

Thank you

  • Bhavya Chopra, Ananya Singha, Anna Fariha, Sumit Gulwani, Chris Parnin, Ashish Tiwari, Austin Z. Henley. Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities. CoRR abs/2310.16164 (2023)
  • Andrew M Mcnutt, Chenglong Wang, Robert A Deline, and Steven M. Drucker. 2023. On the Design of AI-Powered Code Assistants for Notebooks. CHI ’23.
  • Noah Hollmann, Samuel Müller, and Frank Hutter. 2023. LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering. arXiv:2305.03403
  • Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. Challenges and Applications of Large Language Models
  • Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts 2022 CHI.