1 of 10

NL-to-SQL Agent

Presentation @ Blue Yonder by Morgan

Brief in the speaker notes

The assignment: is to design an AI chatbot solution able to answer an executive's questions on their company's activity, based on the Northwind sample dataset and using text2sql techniques. Functionally, the solution should:

Take a user prompt
Connect to a database with the Northwind data loaded in it
Generate SQL queries accordingly and uses the result to generate the answer

The proposed solution should be robust and reliable, identifying possible risks, and should ensure they're mitigated. Choice of technologies or LLM interaction patterns (Agentic, CoT, ReAct ...) is not restricted.

Northwind database: https://github.com/jpwhite3/northwind-SQLite3

Example questions:

What is the current status of our inventory?
What are the year-over-year sales trends?
Who are our top 3 customers by sales volume?
How much did our top customer order over the last six months?

There is not an advised amount of time we would like you to spend on this, however we understand you are busy, so we only ask you invest the time that you can into this. In the call we will discuss what you would have done if you had more time.

2 of 10

Why?

A lot of companies have a lot of data
Between this data and business people, there is an engineering wall
This usually leads to:

Very few people in the organization are using the data
The data team complaining about being inundated with requests from business teams

Welcome Natural Language to SQL (NL-to-SQL)!

The promise: Chat with your data

3 of 10

2 use cases

SQL co-pilots — that deliver a first draft SQL to a data scientist who can modify it
Natural language querying — where there is no human in the loop, the user does not see the intermediate SQL nor the underlying schema

In this presentation, we will be looking at NL querying.

SQL co-pilots is the current working use case, it increases the data team scalability but does not fully solve the problem.

4 of 10

NL-to-SQL in practice

The features built on text-to-SQL are mediocre at best and annoying at worst.

It is at a point where some are questioning if users even want these features at all -> “AI washing” (https://techcrunch.com/2024/08/20/is-your-company-ai-washing-rippling-founder-parker-conrad-thinks-it-might-be/ )

There are few if any solutions that actually work…

5 of 10

Current problems

Non-deterministic — similar prompts can create different answers, as shown in the example below.
Hard to train — there is no direct correlation between time invested in adding context and the resulting accuracy. It is a trial and error approach.
Inaccurate — as a result of the above, real-world scenarios have accuracy of 60% (as I heard)

Elephant in the room: After only one failure, a user might doubt the technology or ignore it altogether

6 of 10

Current solutions

Connect the tool to your existing relational database to a subset of your schemas, tables, columns, or views
Add “context” by providing verified samples for few shot prompting or fine-tuning. Add more context by chunking and adding metadata from unstructured sources like business documentation.
Build an LLM agent to build a prompt with the most relevant semantic context, execute it against the DB, and iterate to recover from errors

7 of 10

Demo time

http://qualid.morgangiraud.com:5000/

This is an example of a basic direct use of NL-to-SQL using an Open Source LLM

Remarks

There is no fancy tricks so far
It works surprisingly well for simple queries!
It starts to be broken as soon as something exotic exist in the DB
It can’t seem to correct itself

8 of 10

Improving LLM performance

Generalized Chain-of-Thought

LLM always improved their confidence in an answer when they can first add some reasoning about the question
Making a tree of reasoning and voting for the best query

Structured output

After the reasoning steps, We can ensure that the LLM outputs theoretically valid SQL statements for a given DB (CFG + schema aware)

LLM-modulo framework

LLM should and must be getting feedback from software 1.0 (validator, optimizer)

Agent framework:

9 of 10

Second demo