1 of 32

Yanhui Zhu, Samik Basu, A Pavan

yanhui@iastate.edu

October 24, 2024

Department of Computer Science

Iowa State University

Regularized Unconstrained Weakly Submodular Maximization

Yanhui Zhu

2 of 32

2

Big Data

Pictures

Audios

Social networks

Videos

Introduction

The Internet

Yanhui Zhu

3 of 32

3

Workflow

Introduction

Cleaning Normalization

Scaling

Encoding

Preprocessing

Feature Selection

Active Learning

Text Summarization

Influence Maximization

Clustering

Algorithms

Important Features

Labeled Data

Summarization

Influential Users

Clustered Labels

Decision

Yanhui Zhu

4 of 32

4

Workflow

Introduction

Feature Selection

Active Learning

Text Summarization

Influence Maximization

Clustering

Algorithms

Also,

optimization problems

Optimization objectives are

Submodular

General algorithms to solve a class of tasks with theoretical guarantees.

Submodular Optimization

Goal

Yanhui Zhu

5 of 32

5

Workflow

Introduction

Feature Selection

Active Learning

Text Summarization

Influence Maximization

Clustering

Algorithms

Submodular Optimization

Theoretically rigorous

Explainable

Cost-effective

Extendable

General algorithms to solve a class of tasks with theoretical guarantees.

Goal

Yanhui Zhu

6 of 32

6

Data summarization

More Applications

Introduction

Social network analysis

k-medoid clustering

Yanhui Zhu

7 of 32

7

Submodular Function Optimization

  • An example: property of submodularity
  • Formal Definitions

2

Yanhui Zhu

8 of 32

8

Influence Diffusion

a

c

f

d

g

b

e

i

h

0.4

0.4

0.5

0.2

0.5

0.3

0.3

0.2

0.4

0.3

0.1

0.1

0.4

 

(expected number of users influenced by a seed set X)

Yanhui Zhu

9 of 32

9

Influence Diffusion

a

c

f

d

g

b

e

i

h

0.4

0.4

0.5

0.2

0.5

0.3

0.3

0.2

0.4

0.3

0.1

0.1

0.4

 

 

(expected number of users influenced by a seed set X)

Yanhui Zhu

10 of 32

10

Influence Diffusion

a

c

f

d

g

b

e

i

h

0.4

0.4

0.5

0.2

0.5

0.3

0.3

0.2

0.4

0.3

0.1

0.1

0.4

 

 

 

 

(expected number of users influenced by a seed set X)

submodular

modular

Yanhui Zhu

11 of 32

11

 

Submodular Function Optimization

Submodular Set Functions

Yanhui Zhu

12 of 32

12

Regularized Unconstrained Weakly Submodular Maximization

3

Yanhui Zhu

13 of 32

13

 

 

Identify influencers to maximize the profit (revenue - cost).

 

Profit maximization (Regularized submodular maximization)

Cost constrained submodular maximization

An Instance

a

c

f

d

g

b

e

i

h

0.4

0.4

0.5

0.2

0.5

0.3

0.3

0.2

0.4

0.3

0.1

0.1

0.4

 

 

Yanhui Zhu

14 of 32

14

 

Regularized Submodular Maximization

Regularized Submodular Maximization

submodular f : data summarization, recommendations, k-medoid clustering, sensor placement, etc.

Yanhui Zhu

15 of 32

15

  • ROI: best approximation for Problem 1, but quadratic runtime
  • UDG: linear runtime, but randomized algorithm and worse approximation.

Regularized Submodular Maximization

Existing algorithms

Can we design an algorithm 1) with good approximation (better than UDG), and

2) faster runtime (faster than ROI)?

Reference

Approximation Guarantees

Time Comp.

Note

ROI

[Jin et al., 2021]

Deterministic

UDG

[Harshaw et al., 2019]

Randomized

OPT: optimal solutions.

Yanhui Zhu

16 of 32

16

Regularized Submodular Maximization

Regularized Submodular Maximization - Results

 

Yanhui Zhu

17 of 32

17

Reference

Approximation Guarantees

Time Comp.

Note

ROI

[Jin et al., 2021]

Deterministic

UDG

[Harshaw et al., 2019]

Randomized

UP

[Our algorithm]

Deterministic

  • Efficiency: Faster, only slightly worse than the tight guarantee (Jin et al., 2021).
  • Scalability and Extendability: Can handle the case when f is not submodular, but only weakly submodular or approximately submodular. (Jin et al., 2021 and Harshaw et al., 2019 cannot.)

Regularized Submodular Maximization

Regularized Submodular Maximization - Results

 

Yanhui Zhu

18 of 32

18

 

Regularized Submodular Maximization

Weakly Submodular f

Bayesian Optimal Experimental Design

Objective: Information gain (IG) of an experiment as the reduction in Shannon entropy from the prior to the posterior.

 

IG is weakly submodular. [Harshaw et al., 2019]

 

Yanhui Zhu

19 of 32

19

 

Regularized Submodular Maximization

Approximately Submodular f

 

Influence function using RR sets [Cohen et al., 2014] to estimate the influence spread.

Yanhui Zhu

20 of 32

20

Regularized Submodular Maximization

Experiments

 

Datasets:

Submodular f : vertex cover

Weakly submodular f : Information gain IG

Modular costs:

Generate modular costs for each vertex/datapoint.

Yanhui Zhu

21 of 32

21

Regularized Submodular Maximization

Experiments – submodular f

Eu-Email network

Protein interaction network

Yanhui Zhu

22 of 32

22

Regularized Submodular Maximization

Experiments – submodular f

Yanhui Zhu

23 of 32

23

Regularized Submodular Maximization

Experiments – weakly submodular f

Segment data

Boston housing data

Yanhui Zhu

24 of 32

24

Regularized Submodular Maximization

Experiments – weakly submodular f

Yanhui Zhu

25 of 32

25

Regularized Submodular Maximization

Conclusion

  • We proposed to study the regularized submodular maximization problem, which extensive applications than regular submodular maximization.
  • We developed a fast deterministic algorithm with good approximation guarantees.
  • Our algorithm can be further applied to weakly/approximately submodular utility functions, marking more real-world application scenarios.

Yanhui Zhu

26 of 32

26

Acknowledgements

  • My advisor: Prof. Pavan Aduri & Prof. Samik Basu.
  • CIKM Conference Program.
  • Travel support from Iowa State University.
  • NSF Travel Grant.

Yanhui Zhu

27 of 32

27

Questions

Yanhui Zhu

28 of 32

28

 

Optimal solution

Output of an approximation algorithm

 

Hardness

  • Most submodular optimization problems are NP-Hard
  • Approximation Algorithms

Introduction

Yanhui Zhu

29 of 32

29

Normalized utility function f :

Submodular Function Optimization

Monotone Set Functions

 

Yanhui Zhu

30 of 32

30

 

modular, knapsack, linear, cost, budget

Submodular Function Optimization

Modular Set Functions

Yanhui Zhu

31 of 32

31

Regularized Submodular Maximization

Hardness

NP-Hard

No constant approximation ratio

 

 

Yanhui Zhu

32 of 32

32

Yanhui Zhu