1
Adaptive Meta-scheduling for Computational Workloads
Abdulrahman Azab
azab@uio.no
Computational Grid
2
Broker
C
C
C
W
W
W
Inter-Grid Structure
(UNICORE, EGI, BOINC,…)
3
Meta-scheduler
Inter-Grid Structure
(UNICORE, EGI, BOINC,…)
4
Inter-Grid Structure
(Condor flocking)
5
Inter-Grid Structure
(Condor flocking)
6
How many connections?
For How long?
How many user
authentications?
Inter-Grid Structure
(P2P Condor flocking, Grid Federation)
7
B-1
B-2
B-3
B-5
B-4
Inter-Grid Architecture
(P2P Condor flocking, Grid Federation)
8
Inter-Grid Architecture
9
Grid as bus
Grid node as bus seat
Grid job as passenger
Inter-Grid Architecture
10
Inter-Grid Architecture
11
Load balancing
Inter-Grid Architecture
12
Gateway
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Slick Inter-Grid Architecture
13
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Stroll-S
Grids with vacant
Matching workers
Scheduler
usegalaxy.eu - Under the Hood
Being built from common components
14
DB server
mq
node15
...
node0
node15
...
node0
node15
...
node0
stratum 1
BWCloud
1. Intel + AMD arch
2. > 8000 CPU cores
3. > 42 TiB RAM
4. > 200 TiB storage
5. > 20 T4 GPUs
Pulsar - 1
c004
c003
c002
c001
c005
c004
c003
c002
NFS
> 3 PB
Portal
Jobs
Master node
…
Italy
TIG stack
Worker
Pulsar - 2
c004
c003
c002
c001
c005
c004
c003
c002
Belgium
Pulsar - 3
c004
c003
c002
c001
c005
c004
c003
c002
France
Pulsar - 12
c004
c003
c002
c001
c005
c004
c003
c002
UK
Celery
DIRAC
Galaxy pilot factory
Presentation title | Name Surname
Slick Design
15
GlS
RIS1
RIS2
RISn
Communication channel
DSM
q4
q3
q2
q1
q0
Gateway
Scheduler
IS
Starvation/far-data
detector
Local
Scheduler
q4
q3
q2
q1
q0
C
C
C
C
Local Clients
W
W
W
W
Local Workers
B
Neighborhood
Broker
Overlay
B
B
B
B
B
SLICK Gateway
Receive
Submit
Gateway Queue
Local Queue
Broker
Commu-
nication
Information
Scheduling
Local
Broker
Job Bus
Information Bus
Inter-grid Job Submission
16
Broker
C
C
C
W
W
W
Broker
C
C
C
W
W
W
Broker
C
C
C
W
W
W
Broker
C
C
C
W
W
W
Job
Slick Scheduling Model
17
universe = java
……
requirements = OpSys == "LINUX"&&
Arch =="INTEL“
Image_size = 2800 Meg
…….
queue 60
Fuzzy
Matchmaker
D 1
D 2
D 3
D 4
D 50
ClassAd
Matchmaker
Initial list
Job description
D 15
D 22
D 20
D 1
D 50
Match list
Ordered match list
1
2
3
4
5
Data locality
r #3 – 1 hops
r #4 – 1 hops
r #1 – 1 hops
r #7 – 2 hops
r #9 – 4 hops
Domains Routes
D 1
D 15
D 20
D 22
D 50
D 1
D 15
D 20
D 22
D 50
r #7 – 2 hops
r #3 – 1 hops
r #1 – 1 hops
r #4 – 1 hops
r #9 – 4 hops
Fuzzy Matchmaker
18
1. Fuzzification of Inputs
19
0.25
0.5
0.75
1
0
D1
0
2500
2000
1500
1000
500
# CPU Slots
Degree of Membership
D2
D3
3000
0.9
0.65
0.24
1. Fuzzification of Inputs
20
0.25
0.5
0.75
1
0
D1
0
5
4
3
2
1
Image_size (100 GB)
Degree of Membership
D2
D3
6
1.0
0.75
0.36
21
0.13
D2
D3
D1
T
# CPU Slots
Degree of Membership
2. Applying Fuzzy Operator
22
IF CPUSlotsOf (j) IS AllocatableInCPUSlots(Di)
AND MemoryImageOf (j) IS AllocatableInVirtualMemory(Di)
THEN j IS AllocatableIn(Di)
The rule weight is specified as:
A separate Fuzzy rule is created for each Domain, Di, as a function of job j as follows:
**Fuzzy AND is represented as Minimum()
3. Applying Implication Method
23
D1 D2 D3
0.52
0.9
0.36
4. Defuzzification
Output = MAX (weight [D1], weight [D2],
weight [D3])
24
Binary Matchmaking
25
Broker
C
C
C
W
W
W
Name = slot1@52016.uis.no
VirtualMemory = 34241
TotalDisk = 3704876
Disk = 34951
CondorLoadAvg = 0.000000
LoadAvg = 0.590000
KeyboardIdle = 6
ConsoleIdle = 6
Memory = 77
Cpus = 1
Arch = "INTEL"
OpSys = "WINNT51“
……………………..
Workers Metadata
Matchmaking
26
HasJava = TRUE
VirtualMemory = 34241
Memory = 77
Arch = "INTEL"
OpSys = “LINUX“
……………………..
universe = java
……
requirements = OpSys == "LINUX" &&
Arch =="INTEL“
Image_size = 28 Meg
…….
Job Attributes
Worker Attributes
MATCH
Sharing resource information
27
Broker
C
C
C
W
W
W
Broker
C
C
C
W
W
W
Broker
C
C
C
W
W
W
Broker
C
C
C
W
W
W
Lots of Data
>1GB for 10,000 workers
Binary operations are always faster
28
Binary representation of Job and Worker Attributes
29
Boolean Attributes
Attribute i is TRUE 🡪 bi = 0
30
Multi-valued Attributes
31
Arch | |
INTEL | 0001 |
IA64 | 0010 |
ALPHA | 0011 |
SGI | 0100 |
| |
OperSys | |
LINUX | 0001 |
WINNT51 | 0010 |
WINNT52 | 0011 |
SOLARIS25 | 0100 |
| |
State | |
Unclaimed | 0001 |
claimed | 0010 |
Owner | 0011 |
Matched | 0100 |
| |
Range-valued Attributes
32
Range-valued Attributes
33
Attribute Records
34
Bj
Rj
Mj
WAR
JAR
Bw
Rw
Mw
16 bits 128 bits 64 bits
Boolean attributes
Range-valued attributes
Multi-valued attributes
Matchmaking
AND
35
Bw
Rw
Mw
Bj
Rj
Mj
16 bits 128 bits 64 bits
WAR
JAR
(BwRw Ʌ BjRj) BwRw = 0
+
(Mw Ʌ MSK) Mj = 0
+
Match
&
Matchmaking
36
0010
000000000000000000000000000000000000000000
1111
000000000000000000000000000000000000000000
Mj
MSKj
0001
1111
64 bits
Mj[0]
Mj[3]
Arch = IA64
OpSys = LINUX
Binary Information set database BIS-db
37
w1
Bj
Rj
Mj
w2
Bj
Rj
Mj
wn
Bj
Rj
Mj
Worker id
WAR
Broker id
# Unclaimed workers
Time-stamp
Broker Architecture
38
Communication controller
Gateway
Scheduler
starvation
detector
Local
Scheduler
q4
q3
q2
q1
q0
C
C
C
C
Local Clients
W
W
W
W
Local Workers
Gateway Broker
Received jobs
Submitted jobs
Job Queue
Broker
Local
Broker
Job Bus
Information Bus
IS-db
IS
BIS-db
BIS
Requested BIS-dbs
Job profile
Job labeled
BIS-db Response
B
Neighborhood
Broker
Overlay
B
B
B
B
B
Speedup
39
InterGridSim
Allocation Protocol
Idle Protocol
Regular Node
Grid CD Protocol
Allocation Protocol
Idle Protocol
Regular Node
Grid CD Protocol
Allocation Protocol
Idle Protocol
Regular Node
Grid CD Protocol
Allocation Protocol
Idle Protocol
Regular Node
Grid CD Protocol
Broker Protocol
Broker
Service
Allocator
Broker Protocol
Broker
Broker Overlay
Idle Protocol
Idle Protocol
InterGridSim
41
B-1
B-2
B-3
B-5
B-4
InterGridSim
1
2
K
1
2
K
1
2
K
1
2
K
1
2
K
1
2
K
1
2
K
Ring
Hyper-Cube
Fully connected
Wire-k-out
InterGridSim: Output
=======================Cycle : 1991================ Cycle time (ms): 139
# Node Types 1 [OS LINUX, CPU 4, M 8]: 20001 Type 2[OS LINUX, CPU 2, M 4]: 29999
# Workload types Type 1[OS LINUX, CPU 2, M 4]: 30000 Type 2[OS LINUX, CPU 2, M 4]: 40000
Total generated workloads for allocation 70000
Succeeded Allocations 44909 Completed workloads 0 Deployments/cycle 7
Waiting workloads 25091
No. of workload exchanging 62908 Connections Between Brokers/cycle 700
(Deployed At) Broker Indexes: 15 27 49 6 44 36 34 23 45 26 9 5 8 41 17 30 24 40 32 47
Broker Indexes:
(0) (1) (2) (3) (4) (5) (6) (7) …. (50)
Broker Queue size & No. of free CPUs
2835 0 0 0 0 182 182 0
0 3970 3970 3970 3970 1566 1566 3970
Broker Queued workloads
Broker Local Allocations:
2 15 15 15 15 1218 1218 15
43
Evaluation
44
Evaluation
45
Validity of the stored resource information
Evaluation
N 🡪 Total Grid size, M 🡪 Number of VOs
Validity of the stored resource information
N = 100, M = 20
N = 500, M = 100 (log scale)
Impact of Broker Failures on Resource Information Updating (N = 500, M = 100)
Ring Topology
Evaluation: Workload allocation
Simulation setup:
50
Results
51
Results
2. Load Balancing
52
Results
2. Load Balancing
53
Thank You��Questions?
54
What is Grid computing?
55
Instead of running programs on one computer
They can be distributed among many computers
On the Internet
What is Grid computing?
56
“Grid computing is concerned with coordinated resource sharing and
problem solving in dynamic, multi-institutional virtual organizations.”
Ian Foster & Karl Kesselman , 2001.
VO1
VO2
VO3
B1
B2
B3
What is Cloud?
57
“A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet”
Ian Foster, Yong Zhao, Ioan Raicu, and Shiyong Lu 2008
Grid vs Cloud
58
Manager(s)
Resource:Hero
I need a Scientific Linux
with 2GB RAM!
I have scientific linux
With 3 GB Ram
Take Hero
User: Ali
Grid vs Cloud
59
I need 3 high-CPU
windows machines for 2 weeks
Available for 1000$
Grid vs Cloud
60
What comes out of Grid computing?
61
278,832 volunteer computers
Speed: 769 teraFLOPS = 769 * 10^12 arithmetic operations per second
Cost: FREE
What comes out of Grid computing?
62