1 of 18

Soft Cluster powercap at SuperMUC-NG with EAR

24/10/2022

EE HPC SOP Workshop

Lluís Alonso

lluis.alonso@bsc.es

2 of 18

Soft powercap

Introduction: Why is powercap necessary?
EAR overview
EAR extensions to support powercap
Node powercap
Soft cluster powercap
Powercap evaluation with synthetic workloads
SuperMUC-NG experiments
Related work

EE HPC SOP Workshop, October 2022

3 of 18

Introduction

Power management has become an important topic for HPC centers.

Hardware constraints
Resource efficiency
Cost constraints

EE HPC SOP Workshop, October 2022

4 of 18

EAR overview

EAR is a system software for energy management created in collaboration between BSC and Lenovo.
EAR offers energy monitoring, accounting, control and optimisation.
Four main components:

Node power manager (EARD)
Database manager (EARDBD) [Not involved]
Optimisation library (EARL)
Cluster power manager (EARGM)

EE HPC SOP Workshop, October 2022

5 of 18

Powercap extensions

Primary goal is to prevent power consumption going above cap

Secondary is the maximisation of power utilisation under said cap. Power balance.

Hierarchical approach with 4 levels:

Global cluster powercap is controlled by meta-EARGM.
Sub-cluster powercap (islands) is controlled by an EARGM.
Node powercap is controlled by the EARD.
Hardware domains (CPU/GPU) are controlled by specific plugins loaded by EARD.

EE HPC SOP Workshop, October 2022

6 of 18

Powercap extensions

Each level implements:

Powercap control: guarantees that the layer does not exceed its power
allocation.
Powercap status: evaluates current power consumption and sends it
to the layer above with hints of
its power needs.
API to be contacted by upper
layers.
Powercap balance: redistribute
power between the domains it
controls.

EE HPC SOP Workshop, October 2022

7 of 18

Node powercap control

Requirement: never exceed a given DC node powercap.
Software approach since there is no hardware that controls node power usage (including GPU) and offers power balance.
Limited frequency of power/energy readings. Power controlled in two stages:

Low level domain (CPU or GPU) control: high frequency validation (500ms) of domain consumption and change of its settings to meet requirements.
Full node control: measures the entire node power at a reduced frequency and adapts the the power allocated to each sub-domain dynamically.

The domain manager (plugins) reports periodically to the node manager (EARD) with its status. If the domain cannot meet its requested settings (requested frequency) it also reports the level of stress it is under, that is, how far from the target it currently is.
The node manager gets the domains’ statuses and decides on the possible actions:

Redistribute power between domains so that both are under the same level of stress.
Request additional power to the global manager if settings are not being met.
If requested settings are being met, it marks a percentage of the excess power as potential to be released.

EE HPC SOP Workshop, October 2022

8 of 18

Node powercap control

Requirement: never exceed a given DC node powercap.
Software approach since there is no hardware that controls node power usage (including GPU) and offers power balance.
Power consumption enforcement:

Dynamic computation of ratio between node power and
(CPU+DRAM)+GPUs
Power limit assigned per domain
Two frequencies of power validation

Short term (if needed), based on hardware readings every ~500ms
Medium term, based on IPMI/DCMI power every 10s

Dynamic power balance.

EE HPC SOP Workshop, October 2022

9 of 18

Soft cluster powercap (at LRZ)

Cost constraints: peak power usage is penalised.
Goal: detect excessive power consumption and bring it below limit.
Node power by default is unlimited.
Configuration: power limit, activation threshold and action, deactivation threshold and action.
Soft cluster powercap algorithm:

EARGM periodically aggregates the power of the nodes under its control.
If the total power approaches the limit set for the cluster, it sets a power limit to all the computational nodes.
If there is a powercap currently in action and the total power goes below a set threshold, the limitation is lifted.

EE HPC SOP Workshop, October 2022

10 of 18

Evaluation of node powercap

Tested in a node with 2 x Intel Xeon Gold 6126, 12 cores each with TDP 125W

Powercap range	Kernel	CPU threads
300-200	BT-MZ.C.x (CPU bound)	24
300-200	DGEMM (AVX 512)	24
350-250	STREAM (Memory bound)	24

EE HPC SOP Workshop, October 2022

11 of 18

Evaluation 1: BT-MZ.C.x

CPU bound application.
Each powercap change is a new kernel execution

EE HPC SOP Workshop, October 2022

12 of 18

Evaluation 2: DGEMM

AVX 512 application
Each powercap change is a new kernel execution

EE HPC SOP Workshop, October 2022

13 of 18

Evaluation 2: STREAM

Memory bound application
Each powercap change is a new kernel execution

EE HPC SOP Workshop, October 2022

14 of 18

Evaluation: SuperMUC-NG

Experiment done in one island (792 nodes)
Each node with 2 x Intel Skylake Xeon Platinum 8174 and 48 cores with TDP 240W.
Power validation using power distribution units (PDU) measurements (AC power).
Powercap for the island set to 285kW; limit for powercap activation at 90% and deactivation at 80%.
Cluster power monitoring set to 2 minutes.

EE HPC SOP Workshop, October 2022

15 of 18

Evaluation: SuperMUC-NG

Same application running on 792 nodes at once (38016 cores)
Multiple jobs, same application

NPB-BT running on all nodes

Wavesim running on all nodes

EE HPC SOP Workshop, October 2022

16 of 18

Related work

Node powercap

Powercap algorithms:

Machine learning approaches selecting the best settings to minimise power usage.
Reactive approaches modifying the settings as the application runs.

Hardware tools (CRAY, RAPL, Intel Node Manager, Nvidia SMI)

Cluster powercap

SLURM
PBSPro

EE HPC SOP Workshop, October 2022

17 of 18

Conclusions and current/future work

The implemented system meets the requirements of SuperMUC-NG.
As an extension for soft powercap: adding other sources of power consumption (other than compute nodes).
Job powercap.
Evaluation of cluster power reallocation in hard powercap

EE HPC SOP Workshop, October 2022

1 of 18

2 of 18

3 of 18

4 of 18

5 of 18

6 of 18

7 of 18

8 of 18

9 of 18

10 of 18

11 of 18

12 of 18

13 of 18

14 of 18

15 of 18

16 of 18

17 of 18

18 of 18