1 of 6

2nd PowerStack�White Paper WG Meeting

(Jan 28, 2020)

https://docs.google.com/presentation/d/1qPISSrP-h8IwrOOwsHVZmoO2hNWmMa_11cbv7WZoJ7c/edit

2 of 6

Agenda

  • (recap) Contents and tentative assignment of white paper
  • Target and scope of PowerStack
  • Discussion of Mandatory/Optional PowerStack features
  • Meeting schedule and others
  • Discussion of the definition of compliant/conformant PowerStack

Comment:

EEHPC SOP WG workshop @ CCGrid (deadline will be April?)

3 of 6

Recap: Assignment of Contributors (tentative)

  • 1. Introduction of PowerStack
    • Background: power - a key design constraint, current system design, challenges, … (Masaaki, Sid)
    • Motivation for designing an HPC PowerStack (Martin, Sid)
    • Target and scope of PowerStack (only HPC or including Cloud? including I/O subsystem, cooling?) (Tapasya,Sid)�-> need to discuss at the next meeting
    • Relationship with PowerAPI (Ryan,Sid)
  • 2. Definition of terminology (ex. “power limit”, “power cap”, …) (All, Masaaki) - everyone can contribute to add terms
  • 3. Definition of PowerStack
    • What is PowerStack? What is a PowerStack compliant/conformant/high-quality system? (Matthias, Dann, Sid)
    • System image of PowerStack enabled systems (necessary HW/SW component) (Daniele, Masaaki)
    • Features of PowerStack (desired functionalities, organization of tools, …) (Ryan helps create overall picture)
      • What should be included (Mandatory) and what is optional (All) -> need to discuss at the next meeting
      • goal (input/output, optimization goal) of the layers of the stack -> section 1 or here?
      • API for each layer, related to PowerAPI (Ryan)
  • 4. Strawman design of PowerStack (need to blush up contents of this section)
    • Example implementations/tools (Each implementers)
    • Example of existing tools (APIs, cluster manager, job scheduler, …)
    • ??Strawman interfaces (ex. GEOPM interface, variorum)??
    • (not the detailed implementation of the tools, but..)
  • 5. Use cases
    • Ask use-case WG to put their summary, need joint discussion with use case WG, see use-case doc for PowerAPI (power api)
  • 6. Conclusions (Masaaki)
  • (Past docs for reference)

Comment:

how about add research result (performance results, …)

-> may not be good for reference docs for procurement?

-> need to set who is the target audience

4 of 6

Target and Scope of PowerStack

  • Target and Scope
    • Of course, HPC systems
      • XPU (CPU/GPU/...), Memory
      • Job scheduler, cluster manager, workload manager, resource manager…
    • Also Cloud? -> Partly
      • difference from HPC center?
        • has different workload, different algorithm may be needed
        • similar strategies can be used for cloud?
      • PowerAPI already used in cloud
      • Hyper scalar?
    • Including I/O subsystem? → NO (but need to write interface requrrement )
      • I/O node, Storage
      • any publication about power/energy on storage staff? → large parallel file system
      • I/O subsystem is same HW with compute nodes? within a procurement for a single system?
      • not dominant in terms of power/energy, purchase cost may dominate
    • Including NW? → YES!
      • individual NW cards, switches (2-3KW each)
      • we need monitoring/control capability and interface to PS
      • power control: channel level
      • (PowerAPI document has the NW section)
      • challenge: NW is shared by all users --> opportunity for job scheduler to control replacement
    • Including cooling? -> NO (but need to mention interface between PS and outside of the system)
      • Fan, Chiller, …
      • large impact in large systems, depends on types of chiller, cooling is outside of control loop
    • Procurement? -> Yes

comment:

depends on What is PS, Why PS

5 of 6

Mandatory/Optional PowerStack Features

  • Mandatory features
    • Enforcing a specified running average power limit
      • at which level? Component/Node/Rack/System?
    • Enforcing cumulative energy limits across an entire cluster
    • Providing component-level power and energy measurements, and associate those measurements with jobs
    • monitoring has higher priority over control
    • Should be adaptive, hierarchical, closed loop
    • Need to discuss definitoin of “feature” first
    • Reference
  • Optional features (high quality implementation)

6 of 6

Regular Meeting Schedule and Others

  • Regular bi-weekly calls
    • PST: Tue. 7-8am → Wed. 7-8am?
    • CET: Tue. 4-5pm → Wed. 4-5pm?
    • JST: Wed. 12-1am → Thu. 12-1am?
    • Only for the next meeting
      • PST: Feb. 11 at 7-8am
      • CET: Feb. 11 at 4-5pm
      • JST: Feb. 12 at 12-1am
  • Mailing-lists for white paper WG
  • Link to editable white paper draft version 0.1 (just a skeleton)
  • Way to communicate with use-case WG?