1 of 9

Data-driven computation leading to the exascale era

B. Nord

SCD Mini-retreat

2018 April 20

2 of 9

Machine Intelligence Ecosystem

No Matter what you call it, it’s data-driven modeling with a highly flexible (many-parameter) model.

Specific tools will change, but this concept will remain.

“Advanced Data-Driven Algorithms” (ADDA)

Leading to 2026 Key concept:

Expand your perspective/re-define ML in your mind to ADDA

3 of 9

Applications: Science and Systems

  • Science analysis,
    • Demonstrated use across HEP: particle physics and cosmology
  • Machine control
    • Accelerators (now-ish)
    • Telescopes (in the future, surely)
  • Compute
    • Job submission, file systems
  • Facilities
    • e.g., Energy management
  • Security
    • E.g., GANs
  • Commerce
    • e.g., tech transfer of our algorithms to industry

Leading to 2026 Key concept:

Cross-cutting opens us to partnerships and funding opportunities.

Important to remember cross-cutting elements for partnerships

4 of 9

Algorithm Development

  • Statistical Modeling
    • Potential only realized if connected to rigorous statistical modeling.
    • e.g., uncertainty estimates
  • Unsupervised learning
    • Avenue for discovery in science e,g., weakly supervised learning
  • DL is fast-evolving:
    • E.g., Generalized CNNs
  • Theory and Interpretability
    • E.g., Renormalization group theory

Leading to 2026 Key concept:

To get the science done, we need real statistics and to understand what’s going on inside.

5 of 9

The future of hardware is diverse

  • Diverse Hardware
    • GPUs, Neuromorphic, KNL, FGPA, QC
  • DL on distributed systems is still open territory
  • ORNL’s Summit is being installed now
    • GPU systems represent diverse hardware
    • ORNL is well into the design of what comes after Summit
    • Will likely have chips and nodes far more heterogeneous than Summit.
  • Match-making: science problem, hardware, algorithm, software

Leading to 2026 Key concept:

Be agile to use the right hardware for the right problem and tool.

6 of 9

Partnerships and Agility

Partnerships will be one of the keys to agility and gaining sufficient expertise.

  • Labs:
    • E.g., ORNL and ANL have large compute facilities
  • Universities:
    • E.g., UChicago and TTI have expertise in Deep Learning
  • Tech Sector:
    • Amazon
    • Google
    • Microsoft

Leading to 2026 Key concept:

Partnerships will be critical for successful science with ADDA

7 of 9

Timelines

  • Near term:
    • Statistical modeling and interpretability
    • Large-scale distributed DL
    • Keep experimenting with new tools and matching them to problems
    • Develop partnerships to keep options open for testing and developing experience
    • Leverage expertise from outside laboratory
  • Intermediate term:
    • Underlying theory of advanced frameworks, like neural networks
    • Keep experimenting
  • Long term:
    • Should have agile relationships so we’re up to speed and able to use modern tools.

8 of 9

Data-driven computation leading to the exascale era�(recap)

  • ML is a microcosm of the data-driven tool space.
    • Tools will change, but concept of highly flexible modeling will remain.
  • Applications:
    • Science analysis, machine control, facilities
    • Many applications beyond science that help science:
      • E.g., job submissions, facilities, file systems
  • Development of data-driven algorithms
    • Potential only realized if connected to rigorous statistical modeling. �(e.g., uncertainty estimates)
    • Use for discovery in science --- e,g., weakly supervised learning;
  • Future of hardware is diverse
    • ORNL is thinking about what comes after Summit and it looks like
    • Hardware, algorithm, and data need match-making
  • Partnerships with Agility
    • E.g., with chicagoland; deep learning partnerships;
      • e.g., mitigates challenges inability to hire data science professionals
    • E.g., Other universities, like UTK
      • E.g., to handle opportunities for hardware access

9 of 9

Astro Community

  • Image processing needs large memory jobs �(OSG is not great for that)
    • MCMC requires highly parallel computing (FermiGrid is not great for that)
    • New databasing infrastructure
    • Mention that LSST is entering the exascale (predicted 0.5 exabytes of data)
    • Something about how LSST plans to bring the users to the data rather han sending the data to the users?
    • Combining data sets? Setting up a library/service for joint analysis?
  • Other issues
    • I/O intensive interactive/development environments
    • New databasing technologies
    • Processing pipelines (parsl, etc.)