1 of 54

Recommender Systems with Fairness Considerations and Strategic Agent Dynamics��Krishna Acharya�PhD Dissertation Defense �(March 03, 2026)��

Dr. Jacob�Abernethy

Dr. Vidya�Muthukumar

Dr. Aaron�Roth

Dr. Kai�Wang

Dr. Juba Ziani Advisor

Committee Members

Krishna Acharya

Speaker

2 of 54

Recommender systems are everywhere

2

3 of 54

Preliminaries: Users, Items, Recommendation Model

3

Model

4 of 54

Recommender models: A timeline

4

5 of 54

Thesis Statement

5

“This dissertation studies three challenges in recommender systems: ensuring fair performance across heterogeneous user populations, characterizing how strategic content producers shape the item catalog, and understanding how the shift to LLM-based semantic recommendation reopens these challenges in a fundamentally new setting”

6 of 54

Overview

6

P1) User Fairness

    • Oracle Efficient Algorithms for Groupwise Regret [ICLR 24]
    • Improving Minimax Group Fairness in Sequential Recommendation [ECIR 25]

P2) Strategic creators & item catalog evolution

    • Producers Equilibria and Dynamics in Engagement-Driven Recommender Systems [TMLR 25]

P3) LLM-based recommendation

    • GLoSS: Generative Language Models with Semantic Search for Sequential Recommendation [Oars@KDD25]

P4) Conclusion & Future directions

7 of 54

Oracle Efficient Algorithms for Groupwise Regret

7

Krishna Acharya, Eshwar Ram Arunachaleswaran,

Sampath Kannan, Aaron Roth, Juba Ziani, ICLR 2024

P1) User Fairness

8 of 54

Recap: Online learning

8

t=1

t=2

t=7

 

 

 

 

 

9 of 54

Online learning with groupwise regret

9

t=1

t=2

Age

Race

Old

Old

Young

White

White

Black

Black

 

t=7

 

 

 

 

10 of 54

�Prior work: Sublinear regret but computationally intractable�

  •  

10

11 of 54

Snapshot of our algorithm

11

old

young

white

always active

AdaNormalHedge

Final prediction

Pred-young

Pred-white

Pred-agnostic

Update internal states of the algorithms of only active groups’ experts

(here: young, white, always active)

 

12 of 54

Experiments

12

  • Regret comparison
    • Our algorithm: online ridge regression as group experts + AdaNormalHedge
    • Benchmark: online ridge regression [Azoury&Warmuth’01]

  • Datasets: Census-income, medical-costs

Our Algorithm

13 of 54

Improving Minimax Group Fairness in Sequential Recommendation

13

Krishna Acharya, David Wardrope, Timos Korres,

Aleksandr Petrov, Anders Uhrenholt, ECIR 2025

P1) User Fairness

14 of 54

Task: Sequential Recommendation

14

Given: Sequence of items a user has viewed

Predict: most likely next item.

15 of 54

Model: Self Attentive Sequential recommendation (SASRec)

15

Transformer model for sequential recommendation

SASRec: Self-Attentive Sequential Recommendation [Kang & McAuley’18]

16 of 54

User Fairness in Recommendation

16

  • Data & algorithms🡪 unfair user outcomes
    • Model highly accurate on aggregate
    • But perform badly on user segments
      • Popularity bias
      • Cold-start users

Head

Tail

17 of 54

Group fairness

Users segmented on

    • Demographics features: sex, race …
    • Functional characteristics: #views, #buys �

Equalize metrics across groups.

    • Ideal: reduce loss of disadvantaged group.
    • Algorithms can inflate loss of the advantaged group.

  • Problems:
    • Intersectionality: users in multiple groups
    • Legally prohibited fields

17

18 of 54

Minimax group fairness

18

 

19 of 54

Distributionally Robust Optimization (DRO)

19

 

20 of 54

Existing DRO approaches for recommendation

20

 

21 of 54

Limitations of group based DRO methods

21

GroupDRO & Streaming DRO have major limitations:

  1. Need group membership during training

  • Users cannot lie in multiple groups, does not scale with intersecting groups

Performance drop

observed

22 of 54

Conditional Value at Risk (CVaR) DRO

22

 

23 of 54

Experiments

Normalized Discounted Cumulative Gain

23

Leave one out split

24 of 54

User groups

24

  1. Ratio of popular items in user’s history

Gpop = {niche, diverse, popular}

  • User interaction length

Gseq= {short, medium, long}�

We experiment across thresholds and resulting group splits

25 of 54

Single-group setting: DRO is effective, CVaR DRO best

25

  • CVaR DRO obtains highest NDCG across all groups & on aggregate
  • Even for highly imbalanced groups splits (Gpop1080):
    • CVaR is best, Group, Streaming DRO ~ similar to standard training.

`

`

Standard training

26 of 54

Multi-group setting: CVaR DRO shines

26

  1. Group-based (GDRO, SDRO)
    • Highly sensitive to choice of “atoms” for DRO loss, impossible to know before training.
  2. CVaR DRO
    • Is group-agnostic & outperforms on aggregate, 5/6 groups.

Popularity based groups

Sequence length groups

27 of 54

Takeaways

  1. Standard training (ERM) poor performance on user segments�
  2. Group-based DRO methods:
    • Groups needed upfront, cannot scale to intersecting groups
    • Performance degrades with imbalance

  • Conditional Value at Risk-DRO doesn’t suffer from the above, outperforms groupwise and on aggregate.

27

28 of 54

Producer equilibria and dynamics in engagement driven recommender systems�

28

Krishna Acharya, Varun Vangala, Jingyan Wang, Juba Ziani, TMLR 2025

P2) Strategic creators

29 of 54

Content creation game amongst producers

29

Users

Embedding space

How to maximize user engagement?

Producers

Alice

Bob

30 of 54

Modelling producer competition

  •  

30

Probability of user k seeing producer i’s content

Relevance score

31 of 54

Content serving rules

  • Softmax

  • Top-K Softmax
    • Filter top-K scores & softmax

  • Greedy
    • Filter top score

    • Low temp T, low top K more greedy

31

  • Luce/Linear rule

  • Round-robin/random
    • Independent of producer content.

32 of 54

Result: Producer strategy at Nash eq supported on basis vectors

32

Nash eq.

33 of 54

Structure of Equilibria & Producer specialization

  1. Producer specialization 🡪 catalog diversity
    • Each producer concentrates on a distinct content niche
    • Yields heterogeneous catalog�
  2. No specialization 🡪 catalog collapse :
    • Each producer concentrates on the same popular content.
    • Yields homogenous catalog�

Equilibria for serving rules

    • Linear (Luce) rule: specialization
    • Round-robin: no specialization
    • Softmax : depends on temperature, top-K (experiments)

33

34 of 54

Experiments

  1. User embeddings:
    • ML100K, Amazon [Hou et al ’23]�
  2. Producers: best response dynamics�
  3. Equilibrium
    • Measure diversity of catalog and
    • Producer utility

34

35 of 54

Result: Greedier serving leads to catalog diversity

35

More greedy

36 of 54

Result: Producer utility increases with greedier serving

36

More

greedy

More

greedy

Linear

Round-robin

Top-10 softmax

Top-20

Full

softmax

37 of 54

GLoSS: Generative Language Models with Semantic Search for Sequential Recommendation�

37

Krishna Acharya, Aleksandr V. Petrov, Juba Ziani

Presented at OARS@KDD2025

P3) LLM-based recommendation

38 of 54

Task: Sequential Recommendation

38

Given: Sequence of items a user has viewed

Predict: most likely next item.

39 of 54

Identifier (ID) based sequential recommenders

✅ Pros

    • Fast Training:
      • Small model (5-10 M)
    • Fast Inference
      • Items stored in NN index 🡪 fast relevance scoring�

39

User

❌ Cons

  1. Learn embeddings for each item
    • Embeddings do not generalize across surfaces
  2. Cold-start
    • Retrain for new items
    • Lower performance for new users

40 of 54

LLM based recommendation

40

  1. Incorporate catalog knowledge: IC, RAG, Finetune?�
  2. What generation strategy to use? �

3. How many candidate texts to generate?

4. How do we ground back to the item catalog?

Generate next item title using an LLM

41 of 54

GLoSS Architecture

41

  1. Verbalize the user’s item history into text using item metadata

  • Finetune LLaMA-3 with QLoRA

  • Generate candidate texts for the next likely item

  • Retrieve closely matching items from item catalog

42 of 54

Candidate text generation

  1. Deterministic: �Beam search decoding

  • Sampling: �Temperature, Top-K softmax ❌

42

43 of 54

Retrieving the closest matching items

  1. Sparse: keyword overlap based

  • Dense/Semantic: Embedding based
    • E5-small : 33M params, d=384 �
    • E5-base: 110M params, d=768

43

TF-IDF, BM25

E5, Qwen-embedder

44 of 54

Experiments

Metrics:

  • Hit Rate/Recall@5�
  • Normalized Discounted Cumulative Gain@5

44

45 of 54

GLoSS vs ID-based models

45

  • Outperforms all ID-based baselines on
    • Recall (+52%) & NDCG (+42%)

46 of 54

GLoSS vs LLM-based benchmarks

46

  • Higher Recall than all LLM-based baselines.
  • Competitive NDCG metrics.

47 of 54

Dense retrieval greatly improves metrics

47

  • Dense retrieval outperforms sparse in catalog grounding.
    • +12% gains in NDCG
    • +3% gains in Recall

48 of 54

Strong metrics across user interaction lengths

48

Short user

sequence

Long

User sequence

Medium

sequence

49 of 54

Takeaways

  1. High quality text generation:
    • 4-bits quantized, LoRA tuned Llama models, paged attention

2. SOTA:

    • Beats all ID-based benchmarks on R@5, NDCG@5
    • Also outperforms LLM based models on R@5�

3. Grounding generated text:

    • Semantic search significantly improves ranking and retrieval metrics

4. Strong metrics across sequence lengths:

    • Short, medium, long sequence users all obtain high metrics.

49

50 of 54

Future directions

50

New risks from economically motivated producers

1. Shilling attack: Seller introduces fake users

to boost its item visibility

    • LLMs bias to the latest tokens, Recency based shilling

2. Semantic rewrite: Seller manipulates its item’s

metadata to increase visibility

    • Hard to detect, mimic natural descriptions

51 of 54

Publications

51

Part of this talk:�User Fairness

    • Oracle Efficient Algorithms for Groupwise Regret [ICLR 24]
    • Improving Minimax Group Fairness in Sequential Recommendation [ECIR 25]

Competition & Item Diversity

    • Producers Equilibria and Dynamics in Engagement-Driven Recommender Systems [TMLR 25]

LLM-based recommendation

    • GLoSS: Generative Language Models with Semantic Search for Sequential Recommendation [OARS workshop KDD25]

Not part of this talk:

Algorithmic Fairness

    • Wealth Dynamics Over Generations: Analysis and Interventions [SaTML 23]

�Game theory, online learning

    • Last-iterate Convergence for Symmetric, General-sum, 2×2 Games Under The Exponential Weights Dynamic [ALT 26]
    • One Shot Inverse Reinforcement Learning for Stochastic Linear Bandits [UAI 24]
  • Differential Privacy
    • Personalized Differential Privacy for Ridge Regression [NRL 25]

52 of 54

Thanks to all my co-authors!

52

Dr. Ashwin Pananjady

Dr. Aaron Roth

Dr. Juba Ziani

Dr. Sampath Kannan

Dr. Eshar Ram

Arunachaleswaran

Dr. Aleksandr V.  Petrov

Lokranjan Lakshmikanthan

Dr. Anders Kirk Uhrenholt

Dr. David Wardrope

Timos Korres

Varun Vangala

Dr. Jingyan Wang

Dr. Franziska Boenisch

Rakshit Naidu

Dr. Vidya

Muthukumar

Jim James

Etash Guha

Guanghui Wang

53 of 54

Thank you, committee!

Dr. Jacob�Abernethy

Dr. Vidya�Muthukumar

Dr. Aaron�Roth

Dr. Kai�Wang

Dr. Juba Ziani Advisor

Committee Members

54 of 54

Questions

54

P1) User Fairness

    • Oracle Efficient Algorithms for Groupwise Regret [ICLR 24]
    • Improving Minimax Group Fairness in Sequential Recommendation [ECIR 25]

P2) Strategic creators & item catalog evolution

    • Producers Equilibria and Dynamics in Engagement-Driven Recommender Systems [TMLR 25]

P3) LLM-based recommendation

    • GLoSS: Generative Language Models with Semantic Search for Sequential Recommendation [OARS workshop KDD25]�