1 of 19

ADAPTING CLASSIC SCHEDULING HEURISTICS FOR ONLINE EXECUTION UNDER UNCERTAINTY  

KUBISHI RESEARCH GROUP

November 17, 2025

Jason Chamorro Loyola Marymount University

Gabriel Twigg-Ho Swinburne University of Technology 

Jared Coleman Loyola Marymount University

Tainã Coleman San Diego Supercomputer Center

Bhaskar Krishnamachari  University of Southern California

Mohammadali Khodabandehlou  University of Southern California

Loyola Marymount

University

2 of 19

 

 

 

 

 

 

 

 

 

 

Task Graph

Compute Network

 

 

 

 

 

 

 

 

 

 

Schedule

 

2

Loyola Marymount

University

3 of 19

COMPARING SCHEDULING ALGORITHMS

  • Scheduling is NP-Hard and not approximable, but what about for practical situations?
    • Scheduling an ML workflow
    • Running a scientific workflow on a supercomputing system
    • Collecting and processing data from an IoT system
    • Running analyses in a tactical edge environment
  • Many tasks scheduling heuristic algorithms have been proposed
    • HEFT, CPoP, BIL, ETF, FCP, FLB, GDL, MinMin, MaxMin, MCT, MET, OLB, SMT, WBA, …
  • Which algorithms works best in practice?

3

Loyola Marymount

University

4 of 19

COMPARING SCHEDULING ALGORITHMS

HEFT Schedule

CPoP Schedule

Task Graph

Network

4

Loyola Marymount

University

5 of 19

COMPARING SCHEDULING ALGORITHMS

HEFT Schedule

CPoP Schedule

Task Graph

Network

0.5

0.5

5

Loyola Marymount

University

6 of 19

 

 

 

 

 

 

 

 

 

Task Graph - Online

Compute Network

stochastic task costs

stochastic data sizes

stochastic compute speeds

stochastic communication strengths

t7

unforeseen tasks

estimate task cost

actual task cost

6

Loyola Marymount

University

7 of 19

THE ONLINE PROBLEM

Hypothetical Schedule

Realized Schedule

  • Deterministic approach
    • Replace unknowns with estimates
    • Use classic scheduling heuristic
    • Commit entire schedule

Estimate task graph

7

Loyola Marymount

University

8 of 19

THE ONLINE PROBLEM

Perfect Information

Online

  • Fully online approach
    • Only consider tasks as they become available to schedule
    • Does not consider estimates of future costs

Actual task graph

8

Loyola Marymount

University

9 of 19

OUR APPROACH

Online Rescheduling - HEFT 

First task finished.

Once a task finishes, it cannot be rescheduled.

HEFT runs again scheduling around completed task.

Second task finished.

Online Rescheduling - HEFT 

First task finished.

Once a task finishes, it cannot be rescheduled.

HEFT runs again scheduling around completed task.

Second task finished.

Online Rescheduling - HEFT 

HEFT runs again scheduling around completed task.

Second task finished.

est. t1

est. t4

est. t2

est. t3

Compute node v1

Compute node v2

t1

est. t2

Compute node v1

Compute node v2

t1

est. t3

est. t2

est. t4

Compute node v1

Compute node v2

Estimated Schedule

Estimated Schedule

Realized Schedule

t1

est. t3

t2

Compute node v1

Compute node v2

Realized Schedule

t2

  • Generate an estimate schedule, reschedule as tasks complete based on observed information
  • Running tasks locked
  • Modular

9

Loyola Marymount

University

10 of 19

WORKFLOWS

Montage Workflow

  • WFCommons
    • Suite of 9 scientific applications
    • Tasks and edges fitted to common probability distributions
    • Computed nodes fitted to empirical distributions from execution traces
  • Applicable and large-scale

10

Loyola Marymount

University

11 of 19

EXPERIMENTAL SETUP

  • Computation-to-communication(CCR)
    • Varied from 0.2 to 5 to simulate compute-bound and communication-bound

Network Speed Estimate

Task Size Estimate

  • Estimator Method 
    • Mean: E[X] of each task's cost or network speed distribution
    • SHEFT-inspired:

11

Loyola Marymount

University

12 of 19

COMPARING ALGORITHMS

  • Heuristics: HEFT and CPoP
  • Schedulers
    • Online: Our proposed strategy. Reschedules as task durations are revealed
    • Naïve Online: One-shot estimation schedule
    • Offline: Assumes full knowledge of actual network and tasks weights
  • Evaluation – Normalized Makespan Ratio
    • Ratio closer 1 indicates performance closer to ideal scheduler

12

Loyola Marymount

University

13 of 19

RESULTS

13

Loyola Marymount

University

14 of 19

WORKFLOW SPECIFIC RESULTS

14

Loyola Marymount

University

15 of 19

CONCLUSION

  • Online Scheduling Framework: Enables classic offline heuristics to operate during runtime in heterogeneous distributed systems.
  • SHEFT adaptation: Extended estimator to networks.
  • Workflows: Proved the validity of online rescheduling on 9 applicable large-scale scientific workflows, improving mean makespan and variance in most cases.
  • Future Work:
    • Refined estimation methods that adapt to observations
    • Reinforcement learning integration
    • Multi-objective tradeoff
    • Computational overhead of rescheduling and time complexity
      • Selective rescheduling

ACKNOWLEDGEMENTS

  • This material is based upon work supported by the National Science Foundation under Award No. 2451267. This work was supported in part by Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196

15

Loyola Marymount

University

16 of 19

APPENDIX

16

Loyola Marymount

University

17 of 19

17

Loyola Marymount

University

18 of 19

PARAMETRIC SCHEDULING

  • Parametric scheduling framework.
    • 36 offline list-scheduling algorithms and their online counterparts
    • 4 modular components: Priority function, comparison function, insertion strategy, and critical path reservation
      • HEFT: Upward rank, earliest finish time, insertion-based, no critical path reservation
      • CPoP: CPoP rank, earliest finish time, insertion-based, critical path reservation

18

Loyola Marymount

University

19 of 19

19

Loyola Marymount

University