1 of 91

1

Bandit Learning in Matching Markets

Zilong Wang

Ph.D. candidate student at�Shanghai Jiao Tong University

Shuai Li

Associate professor at�Shanghai Jiao Tong University

2 of 91

Outline

Part 1: Two-sided matching markets

Part 2: Multi-armed bandits

Part 3: Bandit algorithms in matching markets

Part 4: Open Problem

2

3 of 91

Part 1: Two-sided Matching Markets

3

4 of 91

Matching markets

Talent cultivation (school admissions, student internships)
Task allocation (crowdsourcing assignments, domestic services)
Resource distribution (housing allocation, organ donation allocation)

4

https://www.freepik.com; https://twitter.com/IslingtonBC/status/1623340900725272578

5 of 91

Matching market has two sides

5

Workers

Employers

6 of 91

Both sides have preferences over the other side

6

Worker side

Based on payment or prior familiarity of the task

7 of 91

Both sides have preferences over the other side

7

Employer side

Based on the skill levels of workers

8 of 91

A case study: Medical interns [Roth (1984)]

Hospital side

Internship has relatively low cost

Student side

closely engage with clinical medicine through internships

Historical practice

Medical schools first publish students’ grade ranking
Then hospitals start signing internship agreements with students

How to match?

8

9 of 91

Medical interns (cont.)

9

10 of 91

Stable matching

10

Participants have no incentive to abandon their current partner,

i.e.,

no blocking pair such that they both preferred to be matched with each other than their current partner

Alvin E. Roth and Lloyd S. Shapley jointly won the Nobel Prize in 2012 for their contributions to stable matching theory.

Blocking pair

11 of 91

May be more than one stable matchings

11

12 of 91

A-side optimal stable matching¹

12

¹The existence is proved by Gale and Shapley (1962).

13 of 91

A-side pessimal stable matching

13

14 of 91

How to find a stable matching?

14

Gale-Shapley (GS) algorithm

[Gale and Shapley (1962)]

Agents on one side independently propose to agents on the other side according to their preference ranking until no rejection happens

No rejection happens!

15 of 91

Gale-Shapley (GS) algorithm: Case 2

15

Step 1

Step 2

Step 3

16 of 91

GS properties: Stability

16

17 of 91

GS properties: Time complexity

17

18 of 91

GS properties: Optimality

Who proposes matters

Each proposing-side agent is happiest, matched with the most preferred partner among all stable matchings
Each acceptance-side agent is only matched with the least preferred partner among all stable matchings
A-side optimal stable matching = B-side pessimal stable matching

18

19 of 91

GS properties: Strategic behavior

19

Strategy-proof: Each participant is optimal to be truthful

Deviating results in sub-optimal assignments

20 of 91

GS properties: Strategic behavior (cont.)

GS is strategy-proof for the proposing side [DF (1981); Roth (1982)]

Best for the proposing-side agents to report truthfully

GS is not strategy-proof for the acceptance side

20

¹Assume all of other agents report truthfully

21 of 91

Extension with sets: Many-to-one markets

An agent may match more than one partner

Applications

An employer can hire a group of workers
A school can admit multiple students

21

22 of 91

Preferences over sets: Responsiveness

22

Set 1

Set 2

23 of 91

Preferences over sets: Substitutability

Agents have preferences over groups (instead of simply individuals)

23

Set 1

Set 2

Naturally holds under responsiveness
One of the most generally known conditions to ensure the existence of a stable matching

24 of 91

Substitutable preferences: An example

24

25 of 91

Deferred acceptance (DA) for substitutability

The extension of GS under substitutability

25

The same properties as GS:

Stability
Time complexity
Optimality
Strategic behavior (When A-side propose)

[KC (1982); Roth (1984b); RS (1992)]

Step 1

Step 2

26 of 91

Summary of Part 1: Two-sided matching markets

Introduction to matching markets
Stable matching
Gale-Shapley algorithm: Procedure and properties

Stability
Time complexity
Optimality
Strategic behavior

Extension to many-to-one markets

Responsiveness
Substitutability
Deferred-acceptance algorithm

26

27 of 91

But agents usually have unknown preferences in practice

27

Can learn them from iterative interactions !

28 of 91

Part 2: Multi-armed Bandits

28

29 of 91

What are bandits? [Lattimore and Szepesvári, 2020]

29

To accumulate as many rewards, which arm would you choose next?

Exploitation V.S. Exploration

Time	1	2	3	4	5	6	7	8	9	10
Arm 1	$1	$0			$1	$1	$0
Arm 2			$1	$0

30 of 91

Interactive machine learning

30

Candidate actions

Learning agent

Feedback

Environment

(2) Choose action

(3) Generate feedback

(4) Receive feedback

(5) Improve policy

(1) Faced with

Provide insights for agents in matching markets to learn their

unknown preferences through iterative interactions

31 of 91

Applications

31

Recommendation systems

[Li et al., 2010]

SAT solvers

[Liang et al., 2016]

Advertisement placement

[Yu et al., 2016]

Key part of reinforcement learning

[Hu et al., 2018]

Monte-carlo Tree Search (MCTS) in AlphaGo

[Kocsis and Szepesvári, 2006; Silver et al., 2016]

Public health: COVID-19 border testing in Greece

[Bastani et al., 2021]

32 of 91

Multi-armed bandits (MAB)

32

Items, products, movies, companies, …

CTR, preference value, …

Click information, satisfaction, …

33 of 91

Objective

33

34 of 91

Explore-then-commit (ETC) [Garivier et al., 2016]

34

A/B testing

35 of 91

Explore-then-commit (cont.)

35

Exploration

Exploitation

Sample mean

Hoeffding’s inequality

36 of 91

Upper confidence bound (UCB) [Auer et al., 2002]

36

Exploitation

Exploration

By Hoeffding’s inequality

Upper confidence bound (UCB)

Sample mean

37 of 91

Upper confidence bound (UCB) (cont.)

37

38 of 91

Improve ETC: Elimination [Audibert and Bubeck, 2010]

38

39 of 91

Improve ETC: Elimination (cont.)

39

Uniform exploration

40 of 91

Lower bound [Lai and Robbins, 1985]

40

41 of 91

Bandit learning in matching markets [Liu et al., 2020]

41

Satisfaction over this matching experience

For simplicity, assume arms know their preferences

?

42 of 91

Conflict resolution: One-to-one setting

42

43 of 91

Objective

43

44 of 91

Challenge in matching markets

Learning process: Other players will block observations

Once the player selects an arm based on its exploration-exploitation (EE) strategy, this arm may reject the player due to others’ selections
The individual player’s EE trade-off is interrupted

Objective: Cannot maximize a single player’s utility

Aim to find the optimal equilibrium of the market

44

?

45 of 91

How to control agents’ blockings?

Centralized

All participants submit their estimations to the platform
The platform computes an assignment
All players follow this assignment

Decentralized

Each player independently computes the target arm
Necessary information to communicate:

common index of arms, matching outcomes in each round, etc.

45

46 of 91

Summary of Part 2: Multi-armed bandits

Multi-armed bandits (MAB)

Applications
Explore-then-commit (ETC)
Upper confidence bound (UCB)
Successive elimination
Thompson sampling (TS)
Lower bound

Bandit learning in matching markets

Setting
Challenge

46

47 of 91

Part 3: Bandit Algorithms in Matching Markets

47

48 of 91

Outline

Centralized algorithms

ETC, UCB

Decentralized algorithms

General markets
Markets with unique stable matching
Explore-then-GS (ETGS) strategies

Lower bound
Many-to-one markets
Strategic behavior

Adaptive ETGS

Other variants

48

49 of 91

Warm up: Centralized ETC [Liu et al., 2020]

49

Remaining rounds:

Follow GS’s choice

Exploration

Exploitation

GS with estimated ranking

50 of 91

Unique stable matching

When there is only one stable matching

Player-optimal stable matching = Player-pessimal stable matching
The blocking relationship becomes simpler

50

Regret type	Regret bound	Uniqueness condition	References
Unique stable matching		Serial dictatorship	[Sankararaman et al., 2021]
			[Maheshwari et al., 2022]
		Uniqueness consistency （The most general)	[Basu et al., 2021]

51 of 91

How to balance EE in a more appropriate way?

Exploration-Exploitation trade-off

Exploitation goes though with correct rankings by following GS
Require enough exploration to estimate the correct rankings

Perhaps design manually?
To avoid other players’ block: Coordinate selections in a round-robin way

51

52 of 91

PhasedETC [Basu et al., 2021]

52

phase 1

phase 2

….

T rounds

Phase length grows exponentially

53 of 91

PhasedETC: Regret analysis

53

T rounds

Enough exploration to learn preferences

Explore

GS + exploit

….

Exponentially large term

54 of 91

Explore-then-GS (ETGS) [Kong and Li, 2023]

Avoid unnecessary exploitation before estimating preferences well

Only when all players estimate well, enter GS + exploit

54

phase 1

phase 2

….

T rounds

Phase length grows exponentially

GS +

exploit

Communicate and find that all players estimate their preferences well

55 of 91

ETGS implementation: Communication

55

Remark: each player identifying the arms ranked in the first N+1 is enough to find the player-optimal stable matching.

56 of 91

ETGS implementation: Communication (cont.)

56

Communication round

Player

Select

Estimate well

Select

Estimate well

57 of 91

ETGS implementation: Communication (cont.)

Based on players’ own matching outcomes [Zhang et al., 2022]

57

Communication round

Player

Select

Estimate well

Always

select

58 of 91

ETGS: Regret analysis [Kong and Li, 2023]

58

59 of 91

Improvement: Adaptive ETGS

Idea: Instead of starting GS + exploitation with all players’ agreement, integrating each player’s own learning process into GS steps

Players cooperatively explore arms in a round-robin manner
Once a player identifies the most preferred one, starts exploiting this arm
If the exploited arm is occupied by a higher-priority player (the arm “rejects” the player)

Explore the next most preferred arm (enter the next step of GS)

59

Explore

Exploit

Depends on all players

ETGS

Depends on player itself

Explore

Exploit

Explore

Exploit

Adaptive ETGS

Rejected by the exploited arm

60 of 91

Adaptive ETGS: Strategic behavior

60

Explore

Exploit

Have identified the optimal arm. What to report?

How about reporting NOT?

Equivalent to delayed entering GS in the offline setting
Cannot change the final matching results

How about reporting a non-optimal arm?

Equivalent to misreporting rankings in the offline GS
Cannot improve the final matched partner

61 of 91

Adaptive ETGS: Regret [KWL, NeurIPS 2024]

61

Once an optimal arm

has been identified

…

Stop explore!

Instantly eliminate K-N sub-optimal arms

…

Maintain N arms for cooperative exploration to avoid conflicts

62 of 91

Lower bound [Sankararaman et al., 2021]

62

63 of 91

Lower bound (cont.)

63

64 of 91

Lower bound (cont.)

64

65 of 91

65

Regret type	Regret Bound	Communication type	Strategy-proofness	References
Player-optimal			?	[Liu et al., 2020]
Player-pessimal		Centralized	?	[Liu et al., 2020]
		Decentralized, observed matching outcomes	No	[Liu et al., 2021]
		Decentralized, observed matching outcomes	No	[Kong et al., 2022]
Unique		Decentralized	?	[Sankararaman et al., 2021; Basu et al., 2021; Maheshwari et al., 2022]
Unique		Centralized	?	[Wang and Li, 2024; Kong et al., 2024]
Optimal stable bandits (Unique)		Decentralized	/	[Sankararaman et al., 2021]
Player-optimal		Decentralized	?	[Basu et al., 2021]
		Decentralized, observed matching outcomes	No	[Kong and Li, 2023]
		Decentralized	No	[Zhang et al., 2022]
		Decentralized	Yes	[Kong et al., 2024]
Indifference stable		Decentralized	?	[Kong et al., 2025]

66 of 91

How about many-to-one markets?

66

Extension of one-to-one algorithms

Centralized ETC/UCB [Wang et al., 2022]

Decentralized UCB [Wang et al., 2022]

ETGS [Kong and Li, 2024]

Results in the same regret upper bounds

67 of 91

Responsiveness Regret

67

68 of 91

Many-to-one markets: Substitutability

Challenge: Arms may reject all applications, players fail to explore in a round-robin manner

Idea: Determine which match to explore from the arm side
From arm-proposal DA to design learning process

68

69 of 91

Substitutability: Algorithm [KL, 2024]

First assume players know arms’ preferences¹

69

Step 1 of arm-proposal DA

Step 1 of the online algorithm

?

Enter the next step

All players determine which arm is optimal

?

70 of 91

Substitutability: Theoretical analysis

70

The first result for combinatorial preferences

71 of 91

Many-to-one markets: Results overview

71

Setting	Regret type	Regret Bound	Communication type	Strategy-proofness	References
Responsiveness	Player-optimal			?	[Wang et al., 2022]
	Player-pessimal		Centralized	?
	Player-pessimal		Decentralized, observed matching outcomes	No
	Player-optimal			No	[Kong and Li, 2024]
			Decentralized, observed matching outcomes	Yes	[Kong and Li, 2024]
			Decentralized	No	[Zhang and Fang, 2024]
Substitutability	Player-pessimal		Decentralized, observed matching outcomes, known arms’ preferences	?	[Kong and Li, 2024]

72 of 91

Other setting variants

Contextual information [Li et al., 2022]
Non-stationary preferences [Ghosh et al., 2022; Muthirayan et al., 2023]
Two-sided unknown preferences [PD, 2023; PG, 2023]
Sample complexity
Problem independent regret
Markov matching markets [Min et al., 2022]
Multi-sided matching markets [Mordig et al., 2021]
Money transfer [Jagadeesan et al., 2021]
P2P: matching with budget [Sarkar, 2021]

72

73 of 91

Summary of Part 3: Bandit algorithms in matching markets

Centralized algorithms

ETC, UCB
The failure of UCB

Decentralized algorithms

General markets
Markets with unique stable matching
Explore-then-GS (ETGS) strategies

Lower bound
Many-to-one markets
Strategic behavior

Adaptive ETGS

Other variants

73

74 of 91

Part 4: Open Problem

74

75 of 91

Open problems: Matching markets

Optimality

Regret
Strategic behavior

75

76 of 91

Open problems: Regret

76

What is the optimal regret in the one-to-one setting?

Regret type	Regret Bound	Communication type	References
Optimal stable bandits (Unique stable matching)		Decentralized	[Sankararaman et al., 2021]
Player-optimal stable matching		Decentralized	[Kong et al. 2024]

Gap

77 of 91

Open problems: Regret (cont.)

77

What is the optimal regret in the many-to-one setting?

Setting	Regret type	Regret Bound	Communication type	References
Responsiveness	Player-optimal stable matching			[Kong and Li, 2024]
Responsiveness	Player-optimal stable matching		Decentralized, known matching outcomes
Substitutability	Player-pessimal stable matching		Decentralized, known matching outcomes

Is the player-optimal stable matching achievable?

What is the optimal regret under the responsiveness?

78 of 91

Open problems: Regret & Strategic behavior

78

What is the optimal regret when guaranteeing strategy-proofness?

Regret type	Regret Bound	Strategy-proof	References
Player-optimal stable matching		No	[Kong and Li, 2023; Zhang et al., 2022]
Player-optimal stable matching		Yes	[Kong and Li, 2024]

79 of 91

Open problems: Matching markets (cont.)

How to generalize the setting and what is the optimal regret in these settings?

How to deal with two-sided unknown preferences?

Existing works assume arms have known preferences and use this to conduct coordination/communication. But arms may also have unknown preferences

How to deal with players’ indifferent preferences?

Players may be indifferent over multiple arms

How to utilize the contextual information to accelerate the learning efficiency?

Agents’ features (gender, age, hometown)

How to handle asynchronous agents?

Agents may enter the system at different times

……

79

80 of 91

Thanks!�&�Questions?

Credit: Some images are from Flaticon.com

80

Zilong Wang

Ph.D. candidate student at�Shanghai Jiao Tong University
Research interests: Bandit/RL algorithms
Personal website: https://zilong-wang719.github.io

Shuai Li

Associate professor at�Shanghai Jiao Tong University
Research interests: Bandit/RL algorithms
Personal website: https://shuaili8.github.io/

81 of 91

References: Part 1

Gale, David, and Lloyd S. Shapley. "College admissions and the stability of marriage." The American Mathematical Monthly 69.1 (1962): 9-15.
Roth, Alvin E. "The evolution of the labor market for medical interns and residents: a case study in game theory." Journal of political Economy 92.6 (1984a): 991-1016.
Dubins, Lester E., and David A. Freedman. "Machiavelli and the Gale-Shapley algorithm." The American Mathematical Monthly 88.7 (1981): 485-494.
Roth, Alvin E. "The economics of matching: Stability and incentives." Mathematics of operations research 7.4 (1982): 617-628.
Kelso Jr, Alexander S., and Vincent P. Crawford. "Job matching, coalition formation, and gross substitutes." Econometrica: Journal of the Econometric Society (1982): 1483-1504.
Roth, Alvin E. "Stability and polarization of interests in job matching." Econometrica: Journal of the Econometric Society (1984b): 47-57.
Roth, Alvin E., and Marilda Sotomayor. "Two-sided matching." Handbook of game theory with economic applications 1 (1992): 485-541.

81

82 of 91

References: Part 2

82

Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
Li, Lihong, et al. "A contextual-bandit approach to personalized news article recommendation." International conference on World wide web. 2010.
Kocsis, Levente, and Csaba Szepesvári. "Bandit based monte-carlo planning." European conference on machine learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006.
Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." nature 529.7587 (2016): 484-489.
Yu, Baosheng, Meng Fang, and Dacheng Tao. "Linear submodular bandits with a knapsack constraint." Proceedings of the AAAI Conference on Artificial Intelligence. 2016.
Liang, Jia Hui, et al. "Learning rate based branching heuristic for SAT solvers." Theory and Applications of Satisfiability Testing–SAT 2016: 19th International Conference, Bordeaux, France, July 5-8, 2016, Proceedings 19. Springer International Publishing, 2016.
Bastani, Hamsa, et al. "Efficient and targeted COVID-19 border testing via reinforcement learning." Nature 599.7883 (2021): 108-113.

83 of 91

References: Part 2

83

Hu, Yujing, et al. "Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application." Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.
Garivier, Aurélien, Tor Lattimore, and Emilie Kaufmann. "On explore-then-commit strategies." Advances in Neural Information Processing Systems 29 (2016).
Audibert, Jean-Yves, and Sébastien Bubeck. "Best arm identification in multi-armed bandits." COLT-23th Conference on learning theory-2010. 2010.
Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer. "Finite-time analysis of the multiarmed bandit problem." Machine learning 47 (2002): 235-256.
Agrawal, Shipra, and Navin Goyal. "Further Optimal Regret Bounds For Thompson Sampling." Sixteenth International Conference on Artificial Intelligence and Statistics. 2013.
Lai, Tze Leung, and Herbert Robbins. "Asymptotically efficient adaptive allocation rules." Advances in applied mathematics 6.1 (1985): 4-22.

84 of 91

References: Part 3

84

Liu, Lydia T., Horia Mania, and Michael Jordan. "Competing bandits in matching markets." International Conference on Artificial Intelligence and Statistics. PMLR, 2020.
Liu, Lydia T., et al. “Bandit learning in decentralized matching markets.” Journal of Machine Learning Research 22.211 (2021): 1-34.
Maheshwari, Chinmay, Shankar Sastry, and Eric Mazumdar. "Decentralized, communication-and coordination-free learning in structured matching markets." Advances in Neural Information Processing Systems 35 (2022): 15081-15092.
Kong, Fang, Junming Yin, and Shuai Li. "Thompson Sampling for Bandit Learning in Matching Markets.” International Joint Conference on Artificial Intelligence. 2022.
Basu, Soumya, Karthik Abinav Sankararaman, and Abishek Sankararaman. "Beyond $ log^ 2 (T) $ regret for decentralized bandits in matching markets." International Conference on Machine Learning. PMLR, 2021.
Sankararaman, Abishek, Soumya Basu, and Karthik Abinav Sankararaman. "Dominate or delete: Decentralized competing bandits in serial dictatorship." International Conference on Artificial Intelligence and Statistics. PMLR, 2021.

85 of 91

References: Part 3

85

Kong, Fang, and Shuai Li. "Player-optimal Stable Regret for Bandit Learning in Matching Markets." Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics, 2023.
Zhang, Yirui, Siwei Wang, and Zhixuan Fang. "Matching in Multi-arm Bandit with Collision." Advances in Neural Information Processing Systems 35 (2022): 9552-9563.
Wang, Zilong, et al. "Bandit learning in many-to-one matching markets." Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022.
Kong, Fang, and Shuai Li. "Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility." AAAI Conference on Artificial Intelligence. 2024.
Min, Yifei, et al. "Learn to match with no regret: Reinforcement learning in markov matching markets." Advances in Neural Information Processing Systems 35 (2022): 19956-19970.
Li, Yuantong, et al. "Rate-optimal contextual online matching bandit." arXiv preprint arXiv:2205.03699 (2022).

86 of 91

References: Part 3

86

Pagare, Tejas, and Avishek Ghosh. "Two-Sided Bandit Learning in Fully-Decentralized Matching Markets." ICML 2023 Workshop The Many Facets of Preference-Based Learning. 2023.
Muthirayan, Deepan, et al. "Competing bandits in time varying matching markets." Learning for Dynamics and Control Conference. PMLR, 2023.
Ghosh, Avishek, et al. "Decentralized competing bandits in non-stationary matching markets." arXiv preprint arXiv:2206.00120 (2022).
Pokharel, Gaurab, and Sanmay Das. "Converging to Stability in Two-Sided Bandits: The Case of Unknown Preferences on Both Sides of a Matching Market." arXiv preprint arXiv:2302.06176 (2023).
Jagadeesan, Meena, et al. "Learning equilibria in matching markets from bandit feedback." Advances in Neural Information Processing Systems 34 (2021): 3323-3335.
Mordig, Maximilian, et al. "Finding Stable Matchings in PhD Markets with Consistent Preferences and Cooperative Partners." arXiv preprint arXiv:2102.11834 (2021).

87 of 91

References: Part 3

87

Wang, Zilong and Li, Shuai. "Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship." Theoretical Computer Science (TCS), 2024.
Kong, Fang, et al. "Improved Analysis for Bandit Learning in Matching Markets." 38th Conference on Neural Information Processing Systems (NeurIPS). 2024.
Kong, Fang, et al. "Bandit Learning in Matching Markets with Indifference." 13rd International Conference on Learning Representations (ICLR), 2025.
Zhang, Yirui and Fang, Zhixuan. "Decentralized Competing Bandits in Many-to-One Matching Markets." International Conference on Autonomous Agents and Multiagent Systems Extended Abstract, 2024.

88 of 91

References: Part 4

88

Darak, Sumit J., and Manjesh K. Hanawal. "Distributed Learning in Ad-Hoc Networks: A Multi-player Multi-armed Bandit Framework." arXiv preprint arXiv:2004.00367 (2020).
Boursier, Etienne, and Vianney Perchet. "A survey on multi-player bandits." arXiv preprint arXiv:2211.16275 (2022).
Anantharam, Venkatachalam, Pravin Varaiya, and Jean Walrand. "Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards." IEEE Transactions on Automatic Control 32.11 (1987): 968-976.
Komiyama, Junpei, Junya Honda, and Hiroshi Nakagawa. "Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays." International Conference on Machine Learning. PMLR, 2015.
Chen, Wei, Yajun Wang, and Yang Yuan. "Combinatorial multi-armed bandit: General framework and applications." International conference on machine learning. PMLR, 2013.
Wang, Siwei, and Wei Chen. "Thompson sampling for combinatorial semi-bandits." International Conference on Machine Learning. PMLR, 2018.

89 of 91

References: Part 4

89

Anandkumar, Animashree, Nithin Michael, and Ao Tang. "Opportunistic spectrum access with multiple users: Learning under competition." 2010 Proceedings IEEE INFOCOM. IEEE, 2010.
Rosenski, Jonathan, Ohad Shamir, and Liran Szlak. "Multi-player bandits–a musical chairs approach." International Conference on Machine Learning. PMLR, 2016.
Boursier, Etienne, and Vianney Perchet. "SIC-MMAB: Synchronisation involves communication in multiplayer multi-armed bandits." Advances in Neural Information Processing Systems 32 (2019).
Bubeck, Sébastien, and Thomas Budzinski. "Coordination without communication: optimal regret in two players multi-armed bandits." Conference on Learning Theory. PMLR, 2020.
Bubeck, Sébastien, Thomas Budzinski, and Mark Sellke. "Cooperative and stochastic multi-player multi-armed bandit: Optimal regret with neither communication nor collisions." Conference on Learning Theory. PMLR, 2021.
Vickrey, William. "Counterspeculation, auctions, and competitive sealed tenders." The Journal of finance 16.1 (1961): 8-37.

90 of 91

References: Part 4

90

Clarke, Edward H. "Multipart pricing of public goods." Public choice (1971): 17-33.
Groves, Theodore. "Efficient collective choice when compensation is possible." The Review of Economic Studies 46.2 (1979): 227-241.
Kandasamy, Kirthevasan, et al. "Vcg mechanism design with unknown agent values under stochastic bandit feedback." Journal of Machine Learning Research 24.53 (2023): 1-45.
Basu, Soumya, and Abishek Sankararaman. "Double Auctions with Two-sided Bandit Feedback." Advances in Neural Information Processing Systems 36 (2023).
Sarkar, Soumajyoti. "Centralized Borrower and Lender Matching under Uncertainty for P2P Lending." Companion Proceedings of the Web Conference 2021. 2021.
Cesa-Bianchi, Nicolò, et al. "A regret analysis of bilateral trade." Proceedings of the 22nd ACM Conference on Economics and Computation. 2021.
Weed, Jonathan, Vianney Perchet, and Philippe Rigollet. "Online learning in repeated auctions." Conference on Learning Theory. PMLR, 2016.

91 of 91

References: Part 4

91

Nedelec, Thomas, Noureddine El Karoui, and Vianney Perchet. "Learning to bid in revenue-maximizing auctions." International Conference on Machine Learning. PMLR, 2019.
Babaioff, Moshe, Robert D. Kleinberg, and Aleksandrs Slivkins. "Truthful mechanisms with implicit payment computation." Journal of the ACM (JACM) 62.2 (2015): 1-37.
Babaioff, Moshe, Yogeshwer Sharma, and Aleksandrs Slivkins. "Characterizing Truthful Multi-armed Bandit Mechanisms." SIAM Journal on Computing 43.1 (2014): 194-230.
Nazerzadeh, Hamid, et al. "Where to sell: Simulating auctions from learning algorithms." Proceedings of the 2016 ACM Conference on Economics and Computation. 2016.
Gatti, Nicola, Alessandro Lazaric, and Francesco Trovo. "A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities." Proceedings of the 13th ACM Conference on Electronic Commerce. 2012.
Kakade, Sham M., Ilan Lobel, and Hamid Nazerzadeh. "Optimal dynamic mechanism design and the virtual-pivot mechanism." Operations Research 61.4 (2013): 837-854.