1 of 43

Abdulrahman Azab (and the team)

University of Oslo (UiO)

Nordic e-Infrastructure Collaboration (NeIC)

LUMI User Support Team (LUST)

2 of 43

Why?

3 of 43

4 of 43

5 of 43

6 of 43

7 of 43

Sites

8 of 43

Sites

9 of 43

Sites

10 of 43

Sites

11 of 43

Events - training

12 of 43

Events - Workshop

13 of 43

Slack channel - discussions

14 of 43

Members

Site/Organization

Representative(s)

UiO/Sigma2 (LUMI)

Abdulrahman Azab (coordinator)

IJS/Sling

Barbara Krasovec

IJS/Sling

Dejan Lesjak

IZUM/VEGA

Teo Prica

UL

Leon Kos

UL

Matjaz Pancur

QNIB Solutions

Christian Kniep

CINECA/Leonardo

Francesco Cola

HPE

Alfio Lazzaro

HPE

Jonathan “Bill” Sparks

CSC (LUMI)

Henrik Nortamo

HPE

David Brayford

Karolina

Lukas Krupcik

Karolina

Jakub Kropacek

Karolina

Radovan Pasek

15 of 43

Topics/projects of interest

16 of 43

Target

  • Efficient container runtimes: Apptainer/Singularity, Sarus, Podman, Charliecloud, …
  • Portable HPC containers: Software stack that can be used efficiently on multiple EuroHPC systems for HPC applications running on CPUs and GPUs
    • Share the binaries or recipes
    • Handling the heterogeneity in GPUs, MPI, etc.
  • Efficient sharing platform
    • Which sharing platform/registry to use
  • Provide a portable Container cloud solution (K8S etc).
    • How to control the cloud application: K8s in control or Slurm in control
    • VMs? How to preserve the performance: Virtualization to be Lightweight

17 of 43

Container building - options

  • Pre-built containers
    • How compatible with the system?
    • What level of support?
    • Current approach:
      • On your own! Refer to the documentation and some container inspection tools

18 of 43

Container building - options

  • Build your own container/image
    • Performance VS portability. Approaches:
      • Performance: provide a set of containers targeting different HW systems:
        • Sharing of portable recipes when possible
        • Define a different image tag/binary/recipe for each system/architecture
      • Portability + performance:
        • Fat binaries: Multiple instruction sets in a single binary
        • Dynamic dispatch: detect the CPU Arch at runtime and use the associated instruction set

19 of 43

Container building - options

  • Build your own container/image
    • Provide a set of base images (as Dockerfiles)
    • For MPI-enabled containers, the application inside the container must be dynamically linked to an MPI version that is ABI-compatible with the host MPI.
    • Looking forward to rootless building support (Buildah or singularity)

20 of 43

Container building - options

  • Official containers
    • How/who to maintain and manage?
    • Support teams maintain
    • Only when there is high demand
    • Keep to minimum

21 of 43

Current Cloud solution - CSC

22 of 43

Current Cloud solution - Sigma2

23 of 43

Open calls of interest

24 of 43

Open calls of interest

25 of 43

26 of 43

Activities

  • Monthly community meetings: first thursday of every month
  • Training activity: One workshop/school per year
  • Hackathons: Two per year
  • Projects

27 of 43

Join the community?

28 of 43

Backup slides

29 of 43

30 of 43

MPI Containers - Host MPI

Using the host MPI

31 of 43

MPI Containers - Host MPI

32 of 43

MPI Containers - Host MPI

33 of 43

MPI Containers - Host MPI

34 of 43

MPI Containers - Host MPI

35 of 43

MPI Containers - Container MPI

36 of 43

MPI Containers - Container MPI

37 of 43

Strategy for MPI

38 of 43

Strategy for MPI

Fully containerised, Optimised MPI installation

Replicating the network stack:

  • Check what kind of operating system the target is running
  • Check what kind of network stack the target system has
    • Type of interconnect, software distribution type and version
  • Check for any additional kernel modules enabled for shared memory transport (e.g CMA,knem,xpmem)

During container construction

  • Install all the network stack components
  • Install any other required libraries
  • Build the MPI library against all the installed components
    • It might be required to first build some lower level communication libraries
    • Or there might be a pre-built package available from the same source as the network components.
  • Compile program with installed MPI wrappers

39 of 43

Strategy for MPI

Hybrid MPI installation

  • Install MPI inside the container
  • Build the program using the installed MPI
  • Bind mount the host MPI (and all needed dynamic dependencies)
  • Start the program
    • using either srun or mpirun outside the container depending on site recommendation.�Using this approach we will also be compatible with the startup environment (assuming that we mounted any sockets if needed.)

40 of 43

Containers for sensitive data - future work

41 of 43

Strategy for MPI

We are able maintain full portability using one of the following approaches:

  • Discarding the container (assuming minimal other dependencies)
  • Not using MPI (MPI is not a goal!)
  • Restricting execution to within a node (details).

42 of 43

Strategy for MPI

Using a basic MPI installation inside the container enables multi-node execution (details), but introduces the risk of being incompatible with the startup environment. For improved performance we can install an optimized MPI installation, but this will reduce the number of systems that the container can run on (details). The final approach is the hybrid approach where we utilize the host MPI library. This requires configuration on each system before running the container and also reduces the number of systems that the container can run on (but not in the same way as the previous approach, (details) )

The most suitable strategy will be largely depend on the type of application, set of targeted systems and available effort to package and maintain the different solutions. We hope that this material will help the reader in deciding what strategy to pursue.

43 of 43

How to … LUMI?

Contacts

Resources