1 of 7

4. Cyberinfrastructure for Reproducible Experimentation

Co-Leads: David Balenson (University of Southern California Information Sciences Institute), Patrick Traynor (University of Florida)

Google Drive folder w/ slides and other materials: https://bit.ly/ci-breakout

2 of 7

Breakout Session Description

Enabling reproducible experimentation on shared hardware that is easily and remotely accessible by all researchers has the potential to democratize cybersecurity and privacy research and especially benefit underserved researchers and students, enabling them to compete on an equal standing with those from top-tier institutions.

This session will explore NSF-funded research infrastructure such as Chameleon, CloudLab, FABRIC, POWDER, and SPHERE (formerly DETER) and the hardware, software, and other capabilities needed to support reproducible experimental research in cybersecurity and privacy.

The session will explore questions such as:

  • what is needed for experimentation in different fields of cybersecurity and privacy research;
  • how researchers experiment today (i.e., in a lab, in a testbed, in the real Internet);
  • what would a testbed have to offer to be able to experiment in it; and
  • how we as a community can improve the quality and increase reuse of cybersecurity artifacts (code and datasets, experiment scenarios, etc. in published papers)?

3 of 7

Participants

  • David Balenson, USC-ISI (co-lead)
  • Patrick Traynor, Florida (co-lead)
  • Manuel Egele, Boston University (scribe)
  • Yixin Sun, University of Virginia (scribe)
  • Chris Martens, Northeastern University
  • Christian Colberg, University of Arizona
  • Danny Y. Huang, New York University
  • Jelena Mirkovic, USC-ISI
  • John Liagouris (Boston University)
  • Oshani Seneviratne, RPI

4 of 7

1. What is the topic? Why is it important to society? �to a secure and trustworthy cyberspace? in other ways?

  • Validation of experimental results*
    • Repeatability: same team, same infrastructure
    • Reproducibility: new team, same infrastructure
    • Replicability: new team, new infrastructure
  • Cornerstone of good science
  • Without reproducibility we cannot advance as a science
  • Helps quantify uncertainty
  • Leads to transparency which can help identifying potential limitations
  • Building on work of others avoids starting from scratch, incurring overhead, slowing down progress
  • Can be a bridge for transition to practice, which can be difficult
  • Helps understand how to run systems when delving into a problem area �anew (training new generation of researchers)

* 2019 NAS report on Reproducibility and Replicability in Science, https://nap.nationalacademies.org/catalog/25303/�reproducibility-and-replicability-in-science

5 of 7

2. Is there is an existing body of research and/or practice? �What are some highlights or pointers to it?

  • Infrastructure
    • Individual researcher and lab infrastructure
    • University resources
    • Shared resources: Chameleon, CloudLab, FABRIC, Powder, SPHERE, …
  • Artifact evaluation
    • Initiatives at top conferences: WiSec, ACSAC, USENIX Security, ACM CCS, NDSS, IEEE S&P (forthcoming!), CHES, SysTEX, WOOT, …
    • Evaluation and badging systems: ACM, IEEE, USENIX Security, NDSS, …
    • Artifact catalogues: findresearch.org, secartifacts.github.io, SEARCCH
    • Artifact repositories: Github, Xenodo, Comunda
  • Reproducibility studies

Limited availability of ML artifacts, �most are not runnable �or don’t provide clear output

6 of 7

3. What are important challenges that remain? Are there new challenges that have arisen based on new models, new knowledge, new technologies, new uses, etc?

  • Infrastructure
    • Some institutions and faculty lack funding and resources, need shared infrastructure
    • Specialized equipment may not be readily available
    • Lack of awareness of shared infrastructure and its utility
    • Jupyter/Docker may not work. Infrastructure should support whatever works for a given artifact. VMs are runnable probably for decades, Docker will break in a few years.
  • Artifact evaluation
    • Need proper incentives for everyone involved - authors, reviewers, evaluation chairs, etc.
    • Many artifacts are not reproducible
    • Need metrics and frameworks to measure whether we’re making forward progress
    • How long should it take to reproduce an artifact? One click, 5 minutes, 1 hour HOWEVER �can’t expect research software to reproduce that easily.
    • Data might not be accessible (e.g., IRB restrictions, PII, NDA, proprietary data). How can we prove reproducibility without sharing data (PDaSP)
    • Lots of time is spent verifying work that may/may not be impactful - won’t know until future

7 of 7

4. Are there promising directions to addressing them? What kinds of expertise and collaboration is needed (disciplines and subdisciplines)?

  • Infrastructure
    • Crowdsource and run code on user devices - IoT Inspector (NSF CIRC)
    • Tools to improve automated deployment of code and data on VMs and clusters
    • Dockerize everything and make that available
    • Provide clear summaries of infrastructure capabilities to help experimenters select best solution
    • Devices don’t last forever, provide repository of legacy devices
  • Artifact evaluation
    • Explore how to set incentives
      • E.g., SCIENCE Index is an alternative to h-index that incorporates data-sharing activity
    • Need social and cultural changes
      • Eventually require that ALL papers share artifacts, and realize change takes time (cf rPKI)
      • Submit artifacts together with papers (vs. PCs and AECs are already overloaded)
      • Have people share code and data by default and reproduce later
      • Give proposals and publications credit for reusing artifacts from previous work
      • Require paper retractions if unable to reproduce results
    • Develop standard for artifact packaging and documentation (w/o constraining authors)
    • If data is non-sharable, at least run the system on public data and report those results too
    • New replicability tracks at IMC and ACSAC