1 of 54

Hanzhi Wang

The University of Melbourne

Sublinear-Time Algorithms for Random-Walk Probability Estimation on Large Graphs

11 December 2025

0

2 of 54

Outline

Background

2

Contributions and Techniques

4

Problem Definitions

3

Overview

1

Future Work

5

1

3 of 54

Outline

Background

2

Techniques

4

Problem and Contributions

3

Overview

1

Future Work

5

2

4 of 54

Background

V.S.

Acquiring more servers.
Cons:

1. Hardware Advances

2. Algorithm Improvements

Developing efficient algorithms
Pros:

How to Process Data at a Massive Scale?

Trend of Electricity Consumption

enormous costs,

huge energy consumption

cost-saving,

energy-efficient

3

5 of 54

Background

Performance gains due to the algorithmic improvement of Maximum Subarray problem

MIT Computer Science & Artificial Intelligence Laboratory:

Performance gains due to algorithmic improvements have exceeded those due to hardware advances.

Hardware

Advances

The impact of algorithmic improvement for Polynomial Factorization

4

6 of 54

Exponential Time：
Polynomial Time：
Linear Time：

Inefficient

Efficient

Inefficient

Efficient

Classical Notion.

The Massive Data Volume Challenges

the Classical Notion of Efficient Algorithms

Background

Big Data Notion

5

7 of 54

Background

In the Early Stage

Contemporary Internet

The Massive Data Volume Challenges

the Classical Notion of Efficient Algorithms

completed in 3 years using 100 servers!

completed in 10 s with a single server

6

8 of 54

Designing Scalable Algorithms for P Problems

Motivations

Class P

Class NP

Class S

Solvable

in Polynomial Time

Capable

to handle big inputs

(nearly linear or sub-linear)

Verifiable

in Polynomial Time

❓

Focus: Sublinear-Time Algorithm for Random-Walk Probability

Estimation on Large Graphs

7

9 of 54

Outline

Background

2

Contributions and Techniques

4

Problem Definitions

3

Overview

1

Future Work

5

8

10 of 54

Graphs

Social Network

Protein Structure

Trade Network

Transportation Network

a node

an edge

9

11 of 54

Random Walks on Graphs

1

4

3

2

5

6

7

9

10

8

11

12

10

12 of 54

1

4

3

2

5

6

7

9

10

8

11

12

11

13 of 54

PageRank

Uniformly Sampling a Source Node …

1

4

3

2

5

6

7

9

10

8

11

12

1

4

3

2

5

6

7

9

10

11

12

8

Sampled!

12

14 of 54

Personalized PageRank (PPR)

1

4

3

2

5

6

7

9

10

8

11

12

Given a

Source Node

PageRank

PPR

13

15 of 54

Applications

Graph Theory and Network Analysis

Graph Centrality Evaluation [FOCS’18, SICOMP’23]
Friend Recommendation (Twiter Who-to-Follow) [WWW’13]
Local Graph Clustering [FOCS’06]
Graph Neural Networks [NeurIPS’20]

neural

network

14

16 of 54

Outline

Background

2

Contributions and Techniques

4

Problem Definitions

3

Overview

1

Future Work

5

15

17 of 54

Query Types

Single-Node Random-Walk Probability Computations

Single-Target Random-Walk Probability Computations

Single-Source Random-Walk Probability Computations

Random-Walk Meeting Probability Computations

16

18 of 54

Query Types

Given a

Target Node

Goal:

Single-Node Random-Walk Probability Computations

Single-Target Random-Walk Probability Computations

Single-Source Random-Walk Probability Computations

Random-Walk Meeting Probability Computations

Uniformly Sampled Source Node

17

19 of 54

Query Types

Given a

Target Node

Single-Node Random-Walk Probability Computations

Single-Target Random-Walk Probability Computations

Single-Source Random-Walk Probability Computations

Random-Walk Meeting Probability Computations

18

20 of 54

Query Types

Single-Node Random-Walk Probability Computations

Single-Target Random-Walk Probability Computations

Single-Source Random-Walk Probability Computations

Random-Walk Meeting Probability Computations

19

21 of 54

Query Types

Single-Node Random-Walk Probability Computations

Single-Target Random-Walk Probability Computations

Single-Source Random-Walk Probability Computations

Random-Walk Meeting Probability Computations

Meet!

20

22 of 54

Outline

Background

2

Contributions and Techniques

4

Problem Definitions

3

Overview

1

Future Work

5

21

23 of 54

Contributions Summary

[SIGMOD’20, VLDBJ]

Meeting Prob.

Single-Source

[VLDB’22]

All are first-authored papers or listed alphabetically following the TCS convention.

Single-Target

[STOC’24, KDD’21, KDD’20]

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

22

24 of 54

Contributions Summary

[SIGMOD’20, VLDBJ]

Meeting Prob.

Single-Source

[VLDB’22]

All are first-authored papers or listed alphabetically following the TCS convention.

Single-Target

[STOC’24, KDD’21, KDD’20]

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

23

25 of 54

Contributions Summary

Single-Target

[STOC’24, KDD’21, KDD’20]

All are first-authored papers or listed alphabetically following the TCS convention.

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

Node Push

Edge Push

Power Method

Monte Carlo

24

26 of 54

The Power Method [WWW’98]

PageRank/PPR vector

Initial Distribution

Damping Factor

Transition Matrix

25

27 of 54

The Power Method [WWW’98]

26

28 of 54

Techniques

Node Push

Edge Push

Power Method

Single-Target

[STOC’24, KDD’21, KDD’20]

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

Monte Carlo

27

29 of 54

Problem 1: Single-Node PageRank Estimation

Given a

Target Node

Goal:

Uniformly

Sampled!

28

30 of 54

The Monte-Carlo Method [InternetMath ’05]

1

4

3

2

5

6

7

9

10

8

11

12

1

4

3

2

5

6

7

9

10

11

12

8

Sampled!

Uniformly Sampling a Source Node …

Uniformly Sampling a Source Node …

Uniformly Sampling a Source Node …

29

31 of 54

Challenges: Single-Node PageRank Estimation

30

32 of 54

Our Contributions:

Single-Node PageRank on Undirected Graphs

Hanzhi Wang: Revisiting Local PageRank Estimation on Undirected Graphs: Simple and Optimal. [KDD 2024]

KDD’14, WAW’15

Ours, KDD’24

Monte Carlo

Optimal !

Symmetry of Random-Walk Prob. on Undirected Graphs:

31

33 of 54

Challenges: Single-Node PageRank on Directed Graphs

How about directed graphs?

32

34 of 54

Techniques

Node Push

Edge Push

Power Method

Monte Carlo

Single-Target

[STOC’24, KDD’21, KDD’20]

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

33

35 of 54

Node Push [WAW’07]

34

36 of 54

Push + Monte Carlo [WSDM’16]

Pushing probability mass along in-edges

35

37 of 54

Push + Monte Carlo [WSDM’16]

Push

Balance

36

38 of 54

Our Contributions

Single-Node PageRank Estimation on Directed Graphs

WSDM 2016

SICOMP 2023

STOC 2024 (Ours)

Upper Bound

Lower Bound

FOCS 2018

Lower Bound

Hanzhi Wang, Zhewei Wei, Ji-Rong Wen, Mingji Yang, Revisiting Local Computation of PageRank: Simple and Optimal. [STOC 2024]

37

For this problem, our STOC paper successfully closed the longstanding theoretical gap between the upper and lower bounds. Before our paper, the best-known upper bound is given in a SICOMP paper. Its upper bound is min{𝑛^(2/3) Δ_ ^(1/3), 𝑛^(5/6) 𝑑^(1/6) }, where n denotes the number of nodes in the given graph, Delta denotes the maximum out-degree of the graph, and d denotes the average degree of the graph. We improve the first term of the SICOMP paper’s bound from n^2/3 Delta^1/3 to n^1/2 Delta^1/2. We note that the sum of the exponents of n and Delta, while the exponent of n in our bound is smaller than that in the SICOMP result. Ours is 2/3, while the sicomp one is 3/4. It’s worth mentioning that Delta is the maximum degree, that can be no large than n. For the first term, we reduced the exponent of n, thus our bound is asymptotically smaller than the SICOMP bound. Analogously, we improve the second term in the sicomp bound from n^6/7 d^1/7 to n^ 5/6 to d^1/6, where d is the average degree, and thus is much smaller than n. We reduce the exponent of n in the second term from 5/6 to 1/6. Thus, our bound is better than the previous one.

39 of 54

Our Contributions

Upper Bound

Lower Bound

SODA 2026 (Ours)

STOC 2024

Mikkel Thorup, Hanzhi Wang, Zhewei Wei, Mingji Yang, PageRank Centrality in Directed Graphs with Bounded In-Degree. [SODA’26]

38

40 of 54

Techniques

Node Push

Edge Push

Power Method

Monte Carlo

Single-Target

[STOC’24, KDD’21, KDD’20]

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

39

41 of 54

Problem 2: Single-Target PPR Estimation

Given a

Target Node

40

42 of 54

Open Problem

Upper Bound

Lower Bound

41

43 of 54

Our Contributions

Optimal Algorithm for Single-Target PPR Estimation

Negative Answer to the Open Problem!

Ours, STOC 2024

Optimal under Graph Access Model

Hanzhi Wang, Zhewei Wei, Ji-Rong Wen, Mingji Yang, Revisiting Local Computation of PageRank: Simple and Optimal. [STOC 2024]

WAW 2007

Upper

Bound

Lower

Bound

42

44 of 54

Hard Instances

43

45 of 54

Techniques

Node Push

Edge Push

Power Method

Monte Carlo

Single-Target

[STOC’24, KDD’21, KDD’20]

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

44

46 of 54

Our Contributions

45

47 of 54

Our Contributions

Optimal Algorithm for Single-Target PPR Estimation

Hanzhi Wang, Zhewei Wei et al., Personalized PageRank to a Target Node, Revisited. [KDD 2020]

Positive Answer to the Open Problem!

WAW 2007

Ours, KDD 2020

Optimal under Degree-Sorted Model

Upper

Bound

Lower

Bound

46

48 of 54

Techniques

Node Push

Edge Push

Power Method

Monte Carlo

Single-Target

[STOC’24, KDD’21, KDD’20]

Single-Node

[SODA’26, STOC’24, KDD’24, VLDB’23]

47

49 of 54

Outline

Background

2

Contributions and Techniques

4

Problem Definitions

3

Overview

1

Future Work

5

48

50 of 54

Future Work

Single-Pair Random-Walk Probability Estimation

Upper Bound

❓

Lower Bound

Given a Source Node

Goal:

Given a

Target Node

Solved

49

51 of 54

Future Work

Dynamic Random-Walk Probability Estimation

add an edge

recompute

update

❌

Random-Walk Probability after Update:

no easy-to-apply error guarantees

❓

50

52 of 54

Future Work

v.s.

51

53 of 54

Future Work

Scalable

Algorithms

Dynamic

Algorithms

Practical

Algorithms

52

54 of 54

Thanks!

53