Data-Driven Decision Making:�Managing Risk, Diversity & Model Auditing
Riddho R. Haque
Supervised by Peter J. Haas & Alexandra Meliou
VLDB PhD Workshop 2025
1
Collaborators
2
Anh L. Mai
Matteo Brucato
Azza Abouzied
Peter J. Haas
Alexandra Meliou
Funded By:
Marco Serafini
Data Driven Decisions are important
3
Data-Driven Decision Making
Finance
Transportation
Manufacturing
Healthcare
Journalism
Current decision-making workflows are not enough
4
Data
Solvers / Optimizers
Predictive AI Models
Uncertain
Multimodal
Big
Main Memory Bound
Can’t handle many dimensions.
Managing risks, diversity, etc. scalably
Might be stealing your data
Uninterpretable
Unpoliced
Why use databases to make your decisions?
5
Keep your data in the DB.
[SIGMOD08, SIGMOD13]
Application independent, Declarative SQL-extensions for decision-making.
[VLDB16, SIGMOD20, VLDB25]
Scale beyond main-memory limitations. [VLDB16, SIGMOD20, VLDB24, VLDB25]
DB
Main Memory
What’s next for In-DB Decision-Making?
6
Scaling to Large Probabilistic Relations
Decisions based on multimodal data.
Interpretability, transparency, and
explainability.
My PhD Dissertation Work
7
Large scale stochastic optimization
[VLDB 25]
Multistage Decision-Making
Explainability
Diversity-Aware Decision-Making
(From multimodal data)
Auditing ML-Driven Decisions
Part I: Large Scale Stochastic Optimization
8
Data
Solvers / Optimizers
Predictive AI Models
Uncertain
Multimodal
Big
Main Memory Bound
Can’t handle many dimensions.
Managing risk, diversity, etc. scalably
Might be stealing your data
Uninterpretable
Unpoliced
Large Scale Stochastic Optimization
9
Which stocks should I invest in?
I want but I fear taking risks.
In-DB Solver
[VLDB 25]
SQL-extensions that allow specifying risk constraints and objectives.
[SIGMOD 20, VLDB 25]
Company | Sell In (Days) | … | How many shares? |
GOOG | 278 | … | 1 |
MSFT | 648 | … | 2 |
STBZ | 341 | … | 3 |
EQS | 614 | … | 1 |
Created Portfolio
In- Solvers for Large Scale Stochastic Optimization [VLDB 25]
10
Simpler feasible set ->� faster to solve
More details tomorrow (Sep 2, 2025) 3:45 – 5:15 pm,
4F Wordsworth
Research-21
Part II: Diversification with Optimization
11
Data
Solvers / Optimizers
Predictive AI Models
Uncertain
Multimodal
Big
Main Memory Bound
Can’t handle many dimensions.
Managing risk, diversity, etc. scalably
Might be stealing your data
Uninterpretable
Unpoliced
The Need For Diversity: Sampling Tweets
12
“How the Internet Reacted
to the Arab Spring”
“Get me 50 tweets that fit into the front page, with as many likes as possible”
In-DB Solver
Sample of Tweets
The retrieved tweets may be too similar.
The retrieved tweets may not cover all opinions.
Creating diverse and representative samples
13
Balancing Optimality with diversity and coverage.
Diversity
Coverage
No two sampled tweets are ‘too close’
No unsampled point is ‘too far’ from sampled points
Text to embedding
Measuring similarity between tweets.
High-dimensional
embeddings
Creating diverse and representative samples
14
Diversity
Constraint
Coverage
Constraint
Orders of magnitude runtime improvement over past techniques.
[ICDT 22, SDM 23]
Minimum Set Cover
Diverse Sampling
Scalable
Optimization
Part III: Auditing ML-Driven Decisions
15
Data
Solvers / Optimizers
Predictive AI Models
Uncertain
Multimodal
Big
Main Memory Bound
Can’t handle many dimensions.
Managing risk, diversity, etc. scalably
Might be stealing your data
Uninterpretable
Unpoliced
ML Models Are Increasingly Used in In-DB Decision-Making
16
Predicting Future Stock Prices
Hate Speech
Racism
Spam
Misinformation
Filtering Toxic Content from Social Media
Are ML Models Violating Copyrights/Privacy?
17
Trained on Copyrighted Art?
Targeted Ads Based On Private Conversations?
The New Yorker, 2025
Auditing Binary Classifiers
18
-
-
-
-
-
-
+
+
+
+
+
-
-
-
-
-
-
+
+
+
+
+
-
-
+
-
-
-
-
-
-
+
+
+
+
+
?
?
?
?
?
?
?
?
?
?
?
?
Different Possible Decision Boundaries given a training set.
Oracle queries tell us where the decision boundary passes through.
Unexplained perturbations → Model could have been trained on ‘hidden’ data points.
Summary
19
Paper