500,000 Container Instances a Day?

Open Science Grid Singularity Infrastructure

Mats Rynge

USC Information Sciences Institute

OSG User Support

Open Science Grid

2

OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories and computing centers across the U.S., who together build and operate the OSG project. The project is funded by the NSF and DOE, and provides staff for managing various aspects of the OSG.

Integrates computing and storage resources from over 100 sites in the U.S.

A framework for large scale distributed resource sharing addressing the technology, policy, and social requirements of sharing computing resources.

High Throughput Computing

3

High Throughput Computing

Sustained computing over long periods of time. Usually serial codes, or low number of cores threaded/MPI.

vs. High Performance Computing

Great performance over relatively short periods of time. Large scale MPI.

Distributed HTC

No shared file system

Users ship input files and (some) software packages with their jobs.

Opportunistic Use

Applications (esp. with long run times) can be preempted (or killed) by resource owner’s jobs. Applications should be relatively short or support being restarted.

Distributed High Throughput ("DHTC")

4

High Throughput Computing

Sustained computing over long periods of time. Usually serial codes, or low number of cores threaded/MPI.

vs. High Performance Computing

Great performance over relative short periods of time. Large scale MPI.

Distributed HTC

No shared file system

Users ship input files and (some) software packages with their jobs.

Opportunistic Use

Applications (esp. with long run times) can be preempted (or killed) by resource owner’s jobs. Applications should be relatively short or support being restarted.

DHTC Jobs

5

High Throughput Computing

Sustained computing over long periods of time. Usually serial codes, or low number of cores threaded/MPI.

vs. High Performance Computing

Great performance over relative short periods of time. Large scale MPI.

Distributed HTC

No shared file system

Users ship input files and (some) software packages with their jobs.

Opportunistic Use

Applications (esp. with long run times) can be preempted (or killed) by resource owner’s jobs. Applications should be relatively short or support being restarted.

Container Motivations

  • Consistent environment (default images) - If a user does not specify a specific image, a default one is used by the job. The image contains a decent base line of software, and because the same image is used across all the sites, the user sees a more consistent environment than if the job landed in the environments provided by the individual sites.
  • Custom software environment (user defined images) - Users can create and use their custom images, which is useful when having very specific software requirements or software stacks which can be tricky to bring with a job. For example: Python or R modules with dependencies, TensorFlow
  • Enables special environment such as GPUs - Special software environments to go hand in hand with the special hardware.
  • Process isolation - Sandboxes the job environment so that a job can not peek at other jobs.
  • File isolation - Sandboxes the job file system, so that a job can not peek at other jobs’ data.

6

Container Instances per Day

7

Job Breakdown

8

CERN Virtual Machine FileSystem

9

You are here!

Extracted Images

OSG stores container images on CVMFS in extracted form. That is, we take the Docker image layers or the Singularity img/simg files and export them onto CVMFS. For example, ls on one of the containers looks similar to ls / on any Linux machine:

$ ls /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest/
cvmfs host-libs proc sys anaconda-post.log lib64
dev media root tmp bin sbin
etc mnt run usr image-build-info.txt singularity
home opt srv var lib

Result: Most container instances only use a small part of the container image (50-150 MB) and that part is heavily cached in CVMFS!

10

cvmfs-singularity-sync

  • Containers are defined using Docker
    • Public Docker Hub
  • … and executed with Singularity
  • https://github.com/opensciencegrid/cvmfs-singularity-sync

11

Introduction to OSG - Google Slides