1 of 19

Recent Advances in ReFrame

8th EasyBuild User Meeting

Vasileios Karakasis (NVIDIA) and Theofilos Manitaras (CSCS)

April 24, 2023

2 of 19

Summary

  1. First part (Vasileios)
    1. Community update
    2. Overview of ReFrame 4.0 changes
    3. Review of some of the less well known pre 4.0 features, that are quite useful

  • Second part (Theofilos)
    • Overview of the programmable configuration in ReFrame
    • Using programmable configuration for container-based testing
    • Using programmable configuration for user-environment-based testing

2

3 of 19

ReFrame

8th EasyBuild User Meeting

ReFrame is a powerful framework that enables system testing and performance testing as code with unique HPC features.

  • Composable tests written in Python allowing the creation of reusable test libraries
  • Multi-dimensional test parameterisation
  • Support for test fixtures
  • Parallel execution of tests
  • Programmable configuration
  • Support for multiple HPC schedulers, modules systems, build systems and container runtimes
  • Integration with Elastic and Graylog for feeding directly performance data from tests
  • CI integration through Gitlab child pipelines

3

4 of 19

ReFrame community

8th EasyBuild User Meeting

  • Documentation: https://reframe-hpc.readthedocs.io
    • 300–400 unique readers monthly from all over the world
  • Slack workspace (more than 230 members):
    • Join us through this link.
  • Github
    • ReFrame HPC community group: https://github.com/reframe-hpc
      • Collection of public forks of site test repositories
    • 45 contributors since the beginning
    • Backlog: https://github.com/orgs/reframe-hpc/projects/1
    • Code: https://github.com/reframe-hpc/reframe
      • Give it a ⭐ !
  • PyPI: https://pypi.org/project/ReFrame-HPC/
    • More than 8K downloads/month according to pepy.tech

4

5 of 19

New development workflow since 4.0

8th EasyBuild User Meeting

  • We introduced a develop branch
    • New features go to this branch
    • The /latest docs point to this branch
    • This is the default Github branch
  • The master branch remains as the release branch
    • All releases are made from master
    • Bug fixes, documentation updates, minor enhancements target this branch directly
    • Patch-level releases every one or two weeks and are merged into develop
    • develop will be merged into master just before the next minor or major release
  • Pros
    • Quick release of bug fixes on top of stable releases
    • We can follow more accurately the semantic versioning scheme
  • Cons
    • Periodical synch'ing of develop and master
    • Might sometimes be confusing which branch a PR should target

Check also the updated contribution guide: https://github.com/reframe-hpc/reframe/wiki/contributing-to-reframe

5

6 of 19

Changes in ReFrame 4 – Dropped features

8th EasyBuild User Meeting

All features deprecated in 3.x versions are dropped:

  • @parameterized_test is replaced by the parameter builtin
  • The test's name is now read-only
  • The various test method decorators are only accessible through their builtin names (e.g., @run_after instead of @rfm.run_after)
  • --force-local, --strict and --ignore-check-conflicts options are dropped
  • The schedulers configuration section is replaced by a sched_options section inside each partition definition.
    • NOTE: This is broken in 4.0 and fixed in 4.1
  • Test's variables attribute is deprecated over the new env_vars.

More here: https://reframe-hpc.readthedocs.io/en/stable/whats_new_40.html#dropped-features-and-deprecations

6

7 of 19

Changes in ReFrame 4 – New features

8th EasyBuild User Meeting

Configuration can be split in multiple files

  • Scoping was already a feature through the use of target_systems
  • Now scopes or parts of the configuration can be split in multiple files
  • No need to maintain huge configuration files and repeat the builtin config; the configuration file contains only the information it needs to!
    • In this example, no need to redefine the generic system and builtin environment; they are still valid
    • No need to redefine logging config or any other section.
  • -C option can now be chained

site_configuration = {

'systems': [

{

'name': 'tresa',

'descr': 'My Mac',

'hostnames': ['tresa'],

'modules_system': 'nomod',

'partitions': [

{

'name': 'default',

'scheduler': 'local',

'launcher': 'local',

'environs': ['gnu'],

}

]

}

],

'environments': [

{

'name': 'clang',

'cc': 'clang',

'cxx': 'clang++',

'ftn': '',

'target_systems': ['tresa']

},

]

}

7

8 of 19

Changes in ReFrame 4 – New features (cont'd)

8th EasyBuild User Meeting

  • The filelog log handler for file-based performance logging has been fundamentally revised
    • Default output is CSV so that it can be easily post-processed
    • A header line is printed in every file
    • If the logged fields change, a new log file is created with an updated header and the old is backed up
  • Custom parallel launchers can be directly defined in the configuration file
    • No need extending the framework!
    • … but DO extend it and submit a PR if others can benefit from it!
  • New backends
    • Apptainer container platform
    • Scheduler backend for the Flux Framework

8

9 of 19

Changes in ReFrame 4 – New features (cont'd)

8th EasyBuild User Meeting

New test naming scheme

  • Informational and human readable
  • Tests can be selected by name, by hash or by variant using the -n option
    • -n '^osu_.*'
    • -n /03d6f48f
    • -n osu_allreduce_test@3

9

- osu_allreduce_test %mpi_tasks=16 /03d6f48f @generic:default+builtin

^build_osu_benchmarks ~generic:default+builtin 'osu_binaries /5cf701b0 @generic:default+builtin

^fetch_osu_benchmarks ~generic 'osu_benchmarks /9fc7952e @generic:default+builtin

Test or fixture name

Test parameters

Test hash

Test case info

Fixture scope

Fixture variable name

10 of 19

Changes in ReFrame 4 – New features (cont'd)

8th EasyBuild User Meeting

  • New --dry-run option (since 4.1)
    • Generates all scripts that will be executed and validates as much of the test as possible
    • Tests can also check if in dry-run mode and adapt by calling is_dry_run().
  • Support for custom formatting of JSON records sent to Elastic (since 4.1)
    • Set the callable json_formatter in the httpjson perflog handler
    • Useful to meet requirements of remote schemas and other constraints
  • New --reruns and --duration options for stress testing (since 4.2)
    • Repeatedly run the same test session until a number of runs is reached or a timeout expires
    • Results and failure statistics will be reported from all runs
    • Use with care! 😁
      • reframe -n gpu_burn_check --distribute=all --duration=24h

10

11 of 19

Old but gold – System/Environment features

Extended syntax for valid_systems and valid_prog_environs (since 3.11)

  • Tests are no more bound to specific system names and/or environment names
  • The test can list the features or the properties of a system and/or environment that is valid or not valid for.

'partitions': [

{

'name': 'mypart',

'environs': ['myenv', ...],

'features': ['gpu', 'ib'],

},

...

],

'environments': [

{

'name': 'myenv',

'features': ['cuda', 'mpi'],

'extras': {'mpi_kind': 'mpich'}

},

...

]

11

Example config

# AND features

valid_systems = ['+gpu +ib']

# OR features

valid_systems = ['+gpu', '+ib']

# NOT features

valid_prog_environs = ['-cuda']

# Select extras

valid_prog_environs = ['%mpi_kind=mpich']

Test syntax

More in Theo's part

12 of 19

Old but gold – Command-line options

8th EasyBuild User Meeting

  • The -S or --setvar option (since 3.8 with subsequent refinements)
    • Sets test variables from the command-line
    • Allows also to set variables in nested fixtures
    • Very useful for running test interactively and for experimentation
  • Clone and distribute tests all over the cluster (since 3.12)
    • --repeat=N: repeat selected tests N times
    • --distribute[=STATE]: run the selected tests on every node in STATE
      • Can be combined with the -J option as well:
      • reframe -J reservation=foo --distribute=all --repeat=10 -n my_stress_test -r
  • Generate Gitlab CI child pipelines running the selected tests:
    • --ci-generate (since 3.4.1)
    • Control the CI pipeline from within the test using ci_extras (since 4.2)

E.g.: ci_extras = {'gitlab': {'only': {'refs': ['merge_requests']}}}

12

13 of 19

Old but gold – Execution modes

8th EasyBuild User Meeting

  • Named collections of command line options defined in the configuration file and selectable with the --mode command line option (since 2.5):
    • Treat reframe execution as a black box
    • "Record" a test experiment for future (especially useful in combination with variables and the -S option)
    • Command-line options are combined with those implicitly passed by the mode

'modes': [

{

'name': 'ping_perf',

'options': [

'-c tests/ping.py',

'-S clients=1',

'-S interval=100',

'-n ping_test',

'--exec-order=uid',

'--performance-report',

'--exec-policy=serial',

'--keep-stage-files'

]

}

]

13

Example config

reframe --mode=ping_perf -r

reframe --mode=ping_perf -S clients=10 -r

reframe --mode=ping_perf -S foo=bar -r

reframe --mode=ping_perf --exec-policy=async -r

14 of 19

Old but gold – Programmable configuration

8th EasyBuild User Meeting

ReFrame's configuration file is essentially a Python module

  • You can dynamically generate system/environment entries
    • Useful in cloud environments where the default hostname-based system entry auto-detection is not helpful
    • Useful for generating environments specs based on runtime metadata (more from Theo)
  • Site-specific fine-tuning
    • Custom parallel launchers (since 4.0)
    • Log record formatting for sending to Elastic (since 4.1)

14

15 of 19

Old but gold – Programmable configuration (cont'd)

8th EasyBuild User Meeting

from reframe.core.backends import register_launcher

from reframe.core.launchers import JobLaucher

@register_launcher('slrun')

class MySmartLauncher(JobLauncher):

def command(self, job):

return ['slrun', '-n', job.num_tasks, ...]

site_configuration = {

'systems': [

{

'name': 'my_system',

'partitions': [

{

'name': 'my_partition',

'launcher': 'slrun',

...

}

]

}

]

}

def prepend_prefix(record, extras, ignore_keys):

json_record = {}

for k, v in record.__dict__.items():

if not k.startswith('_') and

k not in ignore_keys:

json_record[f'my_{k}'] = v

return json.dumps([json_record])

site_configuration = {

'logging': [{

'handlers_perflog': [{

'type': 'httpjson',

'url': 'http://elastic_server/',

'level': 'info',

'json_formatter': prepend_prefix

}]

}]

}

15

Custom launcher

Custom record formatter

16 of 19

Old but gold – Dynamic test generation

The make_test() API call allows to create tests programmatically (since 3.10)

16

import reframe.core.builtins as builtins

from reframe.core.meta import make_test

def set_message(obj):

obj.executable_opts = [obj.message]

def validate(obj):

return sn.assert_found(obj.message, obj.stdout)

hello_cls = make_test(

'HelloTest', (rfm.RunOnlyRegressionTest,),

{

'valid_systems': ['*'],

'valid_prog_environs': ['*'],

'executable': 'echo',

'message': builtins.variable(str)

},

methods=[

builtins.run_before('run')(set_message),

builtins.sanity_function(validate)

]

)

class HelloTest(rfm.RunOnlyRegressionTest):

valid_systems = ['*']

valid_prog_environs = ['*']

executable = 'echo'

message = variable(str)

@run_before('run')

def set_message(self):

self.executable_opts = [self.message]

@sanity_function

def validate(self):

return sn.assert_found(self.message, self.stdout)

hello_cls = HelloTest

This is what the --distribute and --repeat options leverage internally

17 of 19

Domain-specific test generation using make_test

Example: Generate a series of STREAM benchmark workflows using a domain-specific spec file

17

stream_workflows:

- elem_type: 'float'

array_size: 16777216

num_iters: 10

num_cpus_per_task: 4

- elem_type: 'double'

array_size: 1048576

num_iters: 100

num_cpus_per_task: 1

- elem_type: 'double'

array_size: 16777216

num_iters: 10

thread_scaling: [1, 2, 4, 8]

- stream_test_2 %num_threads=8 %stream_binaries.elem_type=double %stream_binaries.array_size=16777216 %stream_binaries.num_iters=10 /7b20a90a

^stream_build %elem_type=double %array_size=16777216 %num_iters=10 ~tresa:default+gnu 'stream_binaries /1dd920e5

- stream_test_2 %num_threads=4 %stream_binaries.elem_type=double %stream_binaries.array_size=16777216 %stream_binaries.num_iters=10 /7cbd26d7

^stream_build %elem_type=double %array_size=16777216 %num_iters=10 ~tresa:default+gnu 'stream_binaries /1dd920e5

- stream_test_2 %num_threads=2 %stream_binaries.elem_type=double %stream_binaries.array_size=16777216 %stream_binaries.num_iters=10 /797fb1ed

^stream_build %elem_type=double %array_size=16777216 %num_iters=10 ~tresa:default+gnu 'stream_binaries /1dd920e5

<...>

Found 6 check(s)

Domain spec file

Generated tests

18 of 19

Domain-specific test generation using make_test

The idea

  • Everything happens in a normal test file
  • The spec file is passed in an environment variable
  • The test file reads the spec, generates the tests using make_test and registers them with the simple_test decorator.

Full code at https://github.com/reframe-hpc/reframe/pull/2866.

def load_specs():

spec_file = os.getenv('STREAM_SPEC_FILE')

with open(spec_file) as fp:

return yaml.safe_load(fp)

def generate_tests(specs):

tests = []

for i, spec in enumerate(specs['stream_workflows']):

test_body = {}

thread_scaling = spec.pop('thread_scaling', None)

test_body = {

'stream_binaries': builtins.fixture(

stream.stream_build, scope='environment', variables=spec)

}

methods = []

if thread_scaling:

def _set_num_threads(test):

test.num_cpus_per_task = test.num_threads

test_body['num_threads'] = builtins.parameter(thread_scaling)

methods.append(builtins.run_after('init')(_set_num_threads))

tests.append(make_test(f'stream_test_{i}',

(stream.stream_test,), test_body, methods))

return tests

# Register the tests with the framework

for t in generate_tests(load_specs()):

rfm.simple_test(t)

18

A standard test file

19 of 19

Future outlook

8th EasyBuild User Meeting

  • Improve reporting and post processing of reports
    • Search and compare easily with past reports
  • Generalise the system entry auto-detection method, so that it becomes easier to integrate with cloud environments
  • Allow test parameterisation from the command line
    • Re-parameterise a test based on an existing parameterise
    • Parameterise a test based on an existing variable
  • Generalise test filtering
    • E.g., based on variable values

We are limited in bandwidth but we are more than happy to accept your contributions!

19