1 of 7

4. Cyberinfrastructure for Reproducible Experimentation

Co-Leads: David Balenson (University of Southern California Information Sciences Institute), Patrick Traynor (University of Florida)

Google Drive folder w/ slides and other materials: https://bit.ly/ci-breakout

2 of 7

Breakout Session Description

Enabling reproducible experimentation on shared hardware that is easily and remotely accessible by all researchers has the potential to democratize cybersecurity and privacy research and especially benefit underserved researchers and students, enabling them to compete on an equal standing with those from top-tier institutions.

This session will explore NSF-funded research infrastructure such as Chameleon, CloudLab, FABRIC, POWDER, and SPHERE (formerly DETER) and the hardware, software, and other capabilities needed to support reproducible experimental research in cybersecurity and privacy.

The session will explore questions such as:

what is needed for experimentation in different fields of cybersecurity and privacy research;
how researchers experiment today (i.e., in a lab, in a testbed, in the real Internet);
what would a testbed have to offer to be able to experiment in it; and
how we as a community can improve the quality and increase reuse of cybersecurity artifacts (code and datasets, experiment scenarios, etc. in published papers)?

3 of 7

Participants

David Balenson, USC-ISI (co-lead)
Patrick Traynor, Florida (co-lead)
Manuel Egele, Boston University (scribe)
Yixin Sun, University of Virginia (scribe)
Chris Martens, Northeastern University
Christian Colberg, University of Arizona
Danny Y. Huang, New York University
Jelena Mirkovic, USC-ISI
John Liagouris (Boston University)
Oshani Seneviratne, RPI

4 of 7

1. What is the topic? Why is it important to society? �to a secure and trustworthy cyberspace? in other ways?

Validation of experimental results*

Repeatability: same team, same infrastructure
Reproducibility: new team, same infrastructure
Replicability: new team, new infrastructure

Cornerstone of good science
Without reproducibility we cannot advance as a science
Helps quantify uncertainty
Leads to transparency which can help identifying potential limitations
Building on work of others avoids starting from scratch, incurring overhead, slowing down progress
Can be a bridge for transition to practice, which can be difficult
Helps understand how to run systems when delving into a problem area �anew (training new generation of researchers)

* 2019 NAS report on Reproducibility and Replicability in Science, https://nap.nationalacademies.org/catalog/25303/�reproducibility-and-replicability-in-science

5 of 7

2. Is there is an existing body of research and/or practice? �What are some highlights or pointers to it?

Infrastructure

Individual researcher and lab infrastructure
University resources
Shared resources: Chameleon, CloudLab, FABRIC, Powder, SPHERE, …

Artifact evaluation

Initiatives at top conferences: WiSec, ACSAC, USENIX Security, ACM CCS, NDSS, IEEE S&P (forthcoming!), CHES, SysTEX, WOOT, …
Evaluation and badging systems: ACM, IEEE, USENIX Security, NDSS, …
Artifact catalogues: findresearch.org, secartifacts.github.io, SEARCCH
Artifact repositories: Github, Xenodo, Comunda

Reproducibility studies

2019 NAS report on Reproducibility and Replicability in Science
Collberg et al (CACM 2016)
Olszewski et al (ACM CCS 2023)

Limited availability of ML artifacts, �most are not runnable �or don’t provide clear output

6 of 7

3. What are important challenges that remain? Are there new challenges that have arisen based on new models, new knowledge, new technologies, new uses, etc?

Infrastructure

Some institutions and faculty lack funding and resources, need shared infrastructure
Specialized equipment may not be readily available
Lack of awareness of shared infrastructure and its utility
Jupyter/Docker may not work. Infrastructure should support whatever works for a given artifact. VMs are runnable probably for decades, Docker will break in a few years.

Artifact evaluation

Need proper incentives for everyone involved - authors, reviewers, evaluation chairs, etc.
Many artifacts are not reproducible
Need metrics and frameworks to measure whether we’re making forward progress
How long should it take to reproduce an artifact? One click, 5 minutes, 1 hour HOWEVER �can’t expect research software to reproduce that easily.
Data might not be accessible (e.g., IRB restrictions, PII, NDA, proprietary data). How can we prove reproducibility without sharing data (PDaSP)
Lots of time is spent verifying work that may/may not be impactful - won’t know until future

7 of 7

4. Are there promising directions to addressing them? What kinds of expertise and collaboration is needed (disciplines and subdisciplines)?

Infrastructure

Crowdsource and run code on user devices - IoT Inspector (NSF CIRC)
Tools to improve automated deployment of code and data on VMs and clusters
Dockerize everything and make that available
Provide clear summaries of infrastructure capabilities to help experimenters select best solution
Devices don’t last forever, provide repository of legacy devices

Artifact evaluation

Explore how to set incentives

E.g., SCIENCE Index is an alternative to h-index that incorporates data-sharing activity

Need social and cultural changes

Eventually require that ALL papers share artifacts, and realize change takes time (cf rPKI)
Submit artifacts together with papers (vs. PCs and AECs are already overloaded)
Have people share code and data by default and reproduce later
Give proposals and publications credit for reusing artifacts from previous work
Require paper retractions if unable to reproduce results

Develop standard for artifact packaging and documentation (w/o constraining authors)
If data is non-sharable, at least run the system on public data and report those results too
New replicability tracks at IMC and ACSAC