2 of 59

Interesting Changes in Fields and Communities

At beginning of parallel computing, hardware, software and algorithms were all vibrant areas

and leading areas of computer science research -- everybody had transputers or iPSC2’s

After a while, hardware became hard in universities due to infrastructure needed
Later, (parallel) algorithms were so successful that perhaps interest waned
High Performance computing HPC thrived due to NSF/DoE/Europe/Japan/China national networks and success of computational science algorithms

Judge by citations, conference attendance, Industry jobs, Faculty jobs

Big Data and Deep Learning changed nearly everything
HPC has become essential infrastructure; more engineering than research and less academic opportunities (should be more). Industry and DoE in USA have jobs.
Algorithms are thriving in form of deep learning DL including simulation surrogates

Other forms of Machine Learning in decline and replaced by DL

Big data problems are not as parallel as simulations as typically modest size parallel machines can cope (combined with large number of independent or loosely coupled jobs
Software as always active although top areas changing

Digital Science Center

Advances in High Performance Computing and Deep Learning

3 of 59

Remarks on Convergence Big Data, Simulation, HPC

HPC integration with both Big Data and (Big) Simulation has one clear aspect as both areas need high performance reached by HPC technology

Clear in exascale initiative and success of computational science and GPU’s for Deep Learning
Part of Systems for ML in Dean’s talk at NeurIPS 2017 for Big Data
We term HPCforML and reasonably clear how to support in hardware and software as seen in DoE systems
Details uncertain partly due to uncertainties of Industry directions, HPC Clouds v. supercomputers etc.

We can also integrate Big Data technology (from Apache Software Stack to Containers) with simulations
Dean also discussed use of machine learning to enhance Systems which becomes MLforHPC when system built on HPC technology

Actually involves applications and not just the system

Even broad principles for MLforHPC software and hardware support unclear at this early stage

Digital Science Center

Advances in High Performance Computing and Deep Learning

4 of 59

HPCforML: Similar Challenges in Parallelism for Big Data and Simulation �Complexity of Synchronization and Parallellization

Pleasingly Parallel

Often independent events

MapReduce as in scalable databases

Structured Adaptive Sparse

Regular simulations

Current major Big Data category

Commodity Clouds

HPC Clouds: Accelerators

High Performance Interconnect

Global Machine Learning

e.g. parallel clustering

Deep Learning

HPC Clouds/Supercomputers

Memory access also critical

Unstructured Adaptive Sparse

Graph Analytics e.g. subgraph mining

LDA

Linear Algebra at core �(often not sparse)

Straightforward Parallelism

Parameter sweep simulations

Loosely Coupled

Complex Coupling

Regular Coupling

User Performed Parallelism

Increasing�Data

Simulations

Digital Science Center

Advances in High Performance Computing and Deep Learning

5 of 59

ML Code

NIPS 2015 http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

This well-known paper points out that parallel high-performance machine learning is perhaps most fun but just a part of system. We need to integrate in the other data and orchestration components.

This integration is not very good or easy partly because data management systems like Spark are JVM-based which doesn’t cleanly link to C++, Python world of high-performance ML

Twister2, Cylon at IU address

HPCforML: Integration Challenges

Digital Science Center

Advances in High Performance Computing and Deep Learning

6 of 59

High Performance Computing and Deep Learning

ML for HPC
Surrogates

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

7 of 59

Let’s look at ML for HPC

Traditionally we did HPC for ML -- run deep learning on a GPU BUT ML for HPC is probably more interesting
Currently in science the ML is used to enhance simulations (not data analytics) and dominantly the ML used is Deep Learning
Introduced 3 major categories and 8 subcategories

Improving Simulation with ML-controlled Configurations and Integration of Data
Use ML to Learn Structure, Theory and Model for Simulation
Use ML to Learn Surrogates for Simulation

Work with Shantenu Jha

https://arxiv.org/abs/1909.13340
https://arxiv.org/abs/1909.02363
100 refs classified but already out of date

Digital Science Center

Advances in High Performance Computing and Deep Learning

8 of 59

1.1 MLAutotuningHPC – Learn configurations

2.1 MLAutotuningHPC – Smart ensembles

1.2 MLAutotuningHPC – Learn models from data

3.2 MLaroundHPC: Learning Outputs from Inputs (fields)

3.1 MLaroundHPC: Learning Outputs from Inputs (parameters)

2.2 MLaroundHPC: Learning Model Details (coarse graining, effective potentials)

1.3 MLaroundHPC: Learning Model Details (ML based data assimilation)

2.3 MLaroundHPC: Improve Model or Theory

INPUT

OUTPUT

1. Improving Simulation with Configurations and Integration of Data

2. Learn Structure, Theory and Model for Simulation

3. Learn Surrogates for Simulation

Digital Science Center

Advances in High Performance Computing and Deep Learning

9 of 59

Examples of ML for HPC (work with JCS Kadupitiya, Vikram Jadhao)

Uses quite small Multi-layer perceptron MLP to predict 150 observables from 5 input parameters (~5000 in training set)
MLP outperforms other ML choices
Deployed on nanoHUB for education (an attractive use of surrogates so students get answers fast)
General Electric uses similar approach to give interactive Engine design options (200 in training set)

→ 10⁶ as N_lookup → ∞

The Learning Net

Direct simulation compared to Surrogates

Extraction of ionic structure in electrolyte solutions confined by planar and spherical surfaces.
Classic HPC code written with C++ and accelerated with hybrid MPI-OpenMP.

Digital Science Center

Advances in High Performance Computing and Deep Learning

10 of 59

Up to two billion times acceleration of scientiﬁc simulations with deep neural architecture search

January 23 2020 https://arxiv.org/pdf/2001.08055.pdf
10 scientiﬁc cases including astrophysics, climate science, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters.
Approach also dynamically choses deep network and provides uncertainty estimation, adding further conﬁdence in their use.

Digital Science Center

Advances in High Performance Computing and Deep Learning

11 of 59

INSILICO MEDICINE USED CREATIVE AI TO DESIGN POTENTIAL DRUGS IN JUST 21 DAYS

Map Drug (Material) Structure to Drug (Material) Properties
Hong Kong-based Insilico Medicine sent shockwaves through the pharma industry after publishing research in Nature Biotechnology that proves its AI-powered drug discovery system was capable of producing at least one potential treatment for fibrosis in less than a month's time.
The system uses a Deep Reinforcement Learning algorithm that can imagine potential protein structures based on existing research and certain preprogrammed design criteria.
Insilico's system initially produced 30,000 possible designs, which the research team whittled down to six that were synthesized in the lab, with one design eventually tested on mice to promising results.
Insilico's AI-powered research process could offer a massive push forward for the pharmaceutical industry, which faces increasingly high drug development costs. In just a handful of weeks and for approximately $150,000, Insilico delivered what typically takes pharmaceutical companies $2.6 billion over seven years.

September 4 2019 News Item

Digital Science Center

Advances in High Performance Computing and Deep Learning

12 of 59

Operator Formulation of Deep Learning Inference

Suppose we are solving PDE’s or sets of coupled ODE’s
Typically we solve iteratively New Values = (Differential Operator) Previous Values
Classic applied math tells you nifty difference equations and spectral methods to represent Operator numerically
Deep Learning learns the operator from classic numerics or observational data or their combination
Inference is New Values = (DL Operator) Previous Values
This new nonlinear trained DL operator can allow much larger time steps, incorporate variations in parameters, learn potentials etc.
DL Operator is the new theory (Newton’s laws) of science
High order approximations are traditionally very sensitive to noise and one was taught to avoid but Deep NNs are the opposite – both verbose and robust

See DL operator and multiple LSTM layers with 100s-100,000 parameters
Newton’s laws for this have 2-4 parameters

Digital Science Center

Advances in High Performance Computing and Deep Learning

13 of 59

Learn Newton’s laws with Recurrent Neural Networks

Deep Learning is revolutionizing (spatial) Time series Analysis
Good example is integrating sets of differential equations
Train the network on traditional 5 time step series from (Verlet) difference equations
Verlet needs time step .001 for reliable integration but
Learnt LSTM network is reliable for time steps which

are 4000 times longer � and also learn potential.

Speedup is 30000 on 16 particles interacting with Lennard-Jones potentials
2 layer-64 units per layer LSTM network: 65,072 trainable parameters
5000 training simulations

RNN Error² up to step size dT=4 and total time 10⁶

Verlet error²�dT = 0.01, 0.1

10^-5

10²³

10¹

JCS Kadupitiya, Vikram Jadhao

Digital Science Center

Advances in High Performance Computing and Deep Learning

14 of 59

Results on different potentials (one particle)

Simple harmonic Oscillator:

Hooke's law in VV
T=100, ∆T_MD = 0.001
RuntimeMD = 1.8 sec
Mass(m) and spring constant(k) varied: 500 initial configs

Lennard-Jones:

Lennard-Jones (LJ) in VV
T=100, ∆T_MD = 0.001
RuntimeMD = 2.7 sec
Mass(m) and initial position (x₀) varied: 500 initial configs

Double Well:

[U(X)=X⁴/4 - X²/2] in VV
T=100, ∆T_MD = 0.001
RuntimeMD = 1.9 sec
Mass(m) and initial position (x₀) varied: 500 initial configs

Classic Simulation Error

Digital Science Center

Advances in High Performance Computing and Deep Learning

15 of 59

Multiple versions of MLforHPC used in�Simulating Biological Organisms (with James Glazier @IU)

Learning Model (Agent) Behavior�Replace components by learned surrogates �(Reaction Kinetics Coupled ODE’s)

Dynamic Data Assimilation

Theory to Instance

Smart Ensembles

All steps use MLAutotuning

Digital Science Center

Advances in High Performance Computing and Deep Learning

16 of 59

Futures of ML for HPC

ML for HPC broadly but current use is nonuniform across domains
Use of modest DL network to map material/potential drug structure to properties (generalized QSAR) with simulation and observation: Advanced Progress
Learn surrogates for large scale simulations: good results with major speedup
Use of MLforHPC in agent-based systems (learn agents replacing by surrogates): Very promising but few results

Use in Sociotechnical simulations and in virtual tissues (agents are people or cells)

Macroscopic Structure as in learn complex multi-particle potentials scaling to N⁷: many great successes
Learn Collective coordinates and guide ensemble computations: dramatic progress with speedups up to 10⁸
Microscale; learn dynamics of small scale such as clouds, turbulence: Interesting results but much more to do
Use of Recurrent NN’s to represent dynamics (learn numerical differential operators): Promising but only studied in small problems
Learn errors as well as values in differential equation solutions
Minimize number of expensive HPC simulations -- 2 Billion paper intriguing

Digital Science Center

Advances in High Performance Computing and Deep Learning

17 of 59

Looking at Covid Distributions

Work with Public Health Department at Pittsburgh
Gregor von Laszeswki

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

18 of 59

Times Series represented by Deep Learning

Molecular Dynamics Solution is “just” a time series and we saw DL derived an operator to describe this

Classic applied math also gives a numeric operator based on Newton’s Laws

Time Series or rather sequences have two important cases
Forecast the future of time series observables

Solve Differential Equations
Ride Hailing and eCommerce delivery
Earthquakes
Environmental Science
Spread of COVID

Predict new observables in time interval of time series - sequence to sequence map

Natural Language Processing: Turkish to English
Scheduling Clouds or HPC Systems

Digital Science Center

Advances in High Performance Computing and Deep Learning

19 of 59

Basic Spatial (bag) Time Series

Forecast the Future �(any number of time units

any number properties)

Predict Now

or Seq2Seq map

as in

English to French

rainfall to runoff

Input Properties

Static e.g. %Seniors

Dynamic e.g. Covid cases per day

Space x

(Different data sources, not necessarily nearby)

Forecast the Future

Time t

Seq 2 Seq

Data Analysis Unit

Time sequence at one space point

For Natural Language Processing, space points are different paragraphs or books. A few sentences at each point. Earthquake points are nearby

Digital Science Center

Advances in High Performance Computing and Deep Learning

20 of 59

General Deep Learning Strategies

Attention means that you “learn” from other related data (the past)
Transformer uses matrix features by comparing structure (scalar product) across other time and space points
LSTM uses history passed through time sequence
Just a few studies of Transformer for forecasting problem
Current implementation looks at full attention or merges attention from space and time separately but a lot of research needed here for both training and inference

Hybrid Transformer (for encoder) and LSTM (for decoder)

Pure LSTM

Two different models used

Q(from i) K^T(from j) V(j)

added over j

is attention for i

Q K V are dense layers on input i.e linear combinations plus activations on inputs

Merge

Final

Initial

LSTM Layer

Outputs

Final

Initial

LSTM Layer

Outputs

Inputs

optional

Digital Science Center

Advances in High Performance Computing and Deep Learning

21 of 59

Pure LSTM description of 205 day 314 City Data

SQRT(N) Summed over cities from fit to individual times/cities for daily data with 2 week prediction

Red is error

Digital Science Center

Advances in High Performance Computing and Deep Learning

22 of 59

Hybrid Transformer with intrinsic error for 314 cities/counties and 159 days ( 7.11 secs/epoch)

Digital Science Center

Advances in High Performance Computing and Deep Learning

23 of 59

Hybrid Transformer with larger intrinsic error for 110 cities/counties and 115 days

Digital Science Center

Advances in High Performance Computing and Deep Learning

24 of 59

Particular Regions

from Hybrid Transformer

New York City

Chicago (Cook County)

Digital Science Center

Advances in High Performance Computing and Deep Learning

25 of 59

Particular Regions

from Hybrid Transformer

Seattle (King)

Los Angeles

Digital Science Center

Advances in High Performance Computing and Deep Learning

26 of 59

Comments I

Google Colab Pro with 3300 lines of Python

Tensorflow with custom training; 5-10 seconds per epoch with GPU

We used positional encoding of both input and output

NLP Transformer uses a related but different mechanism -- need to compare

Weekly structure motivates cosθ, sinθ as input and output time series where runs from 0 to 2π over 7 days.

Could use a similar strategy for other time periods (e.g. annual environment data, daily traffic patterns)
Also used linear indicator for space and time

Encoding used in both LSTM and hybrid model and clearly improves fit
Inputs: Daily Cases, Fatalities; 12-30 static properties; 4 encodings

Daily social distancing to 314 county dataset

Predictions: Cases Fatalities for next day and 14 days in future; 4 encodings (weighted in loss function); total 34

Implies missing data ignored in custom loss function

Digital Science Center

Advances in High Performance Computing and Deep Learning

27 of 59

Comments II

Two datasets

110 Counties with 115 days (till May 25) and 30 static properties such as %seniors, %hispanic, asthma measure, number of beds, etc.
314 Counties with 205 days (till August 13) and 12 refined static properties

Results shown summed over cities or for individual locations
Model fitted to square root of daily cases/deaths as this agrees with a MSE loss function with uniform errors (for counting errors √N has error O(1))
Window sizes 5-13 all look quite good; today sliding window of size 9
Transformer has non trivial issue in calculating total attention as each sequence can match to 553895 other points

Calculate by Monte Carlo over attention points and groupings
Fit uses groups of 314 by 9 sequences randomized over space and time

Inference looks at further choices but larger dataset shows little error from Monte Carlo
Current results use 4 attention heads

Digital Science Center

Advances in High Performance Computing and Deep Learning

28 of 59

Collection of Time Series Machine Learning Algorithms (MLPerf)

Areas	Applications	Model	Data sets	Papers
Transportation	Cars, Taxis, Freeway Detectors	TT-RNN , BNN, LSTM	Caltrans highway traffic [1], Taxi/Uber trips [2-5]	[6-8]
Medical	Wearables, Medical instruments: EEG, ECG, ERP, Patient Data	LSTM, RNN	OPPORTUNITY [9-10], EEG [11-14], MIMIC [15]	[16-20]
Cybersecurity	Intrusion, classify traffic, anomaly detection	LSTM	GPL loop dataset [21], SherLock [22]	[21, 23-25]
General Social Statistics	Household electric use, Economic, Finance, Demographics, Industry	CNN, RNN	Household electric [26], M4Competition [27],	[28-29]
Finance	Stock Prices versus time	CNN, RNN	Available academically from Wharton [30]	[31]
Science	Climate, Tokamak	Markov, RNN	USHCN climate [32]	[33-35]
Software Systems	Events	LSTM	Enterprise SW system [36]	[36-37]
Language and Translation	Pre-trained Data	Transformer [38]	[39-40]	[41-42]Mesh Tensorflow
Google Speech	All-Neural On-Device Speech Recognizer	RNN-T		[43]
IndyCar Racing	Real-time car and track detectors	HTM, LSTM		[44]
Social media	Twitter	Online Clustering	Available from Twitter	[45-46]

Xinyuan Huang from Cisco and MLPerf

Digital Science Center

Advances in High Performance Computing and Deep Learning

29 of 59

Lots of Applications and DL Opportunities: Hydrology

Essentially all application areas are in exploratory stage and many only modest use of deep learning
Hydrology has an impressive 129 deep learning papers studying issues such as rainfall and runoff and relation to terrain
I studied a few of these ways and believe although sound and interesting, none of them use the optimal choices of network and hyperparameters
We should get examples and challenge community to improve as well as looking at other communities
We propose combining with benchmarking activity and industry MLPerf

https://eartharxiv.org/xs36g/

Digital Science Center

Advances in High Performance Computing and Deep Learning

30 of 59

Time Series and Operators for Earthquakes

Earthquakes have a solid physics theory for the movement of the earth which is not very useful as depends on largely unknown data on friction laws and fault structures

We observe this theory and controlling data (the hidden variables) through observations of earthquakes over the years

The earthquake operator predicts probabilities of earthquakes of different magnitudes over different time intervals

As faults/data vary this operator should be “nearest neighbor” spatial (2D)

Think of as an image with pixels holding log(total energy) in a time interval and use convLSTM

Predicted True

From Test Set: Predicted True

Digital Science Center

Advances in High Performance Computing and Deep Learning

31 of 59

High Performance Computing and Deep Learning Benchmarks

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

32 of 59

MLPerf Consortium Deep Learning Benchmarks

Some Relevant Working Groups

Training
Inference (Batch and Streaming)
TinyML (embedded)
Deep Learning for Time Series
Power
Datasets
HPC (DoE Labs)
Research
Science Data �(Just Approved)

MLPerf's mission is to build fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services.

Benchmark what user sees

Used for purchasing decisions worth >$1B USD, rapidly growing market (ML chipset market in 2025 is ~$60B)

Total 50 FTE

Accelerate progress in ML via fair and useful measurement
Serve both the commercial and research communities
Enable fair comparison of competing systems yet encourage innovation to improve the state-of-the-art of ML
Enforce replicability to ensure reliable results
Keep benchmarking effort affordable so all can participate

73 Companies; 10 universities

Digital Science Center

Advances in High Performance Computing and Deep Learning

33 of 59

MLPerf Industry Machine Learning Site

Training v0.7

Images, Images, Images, Translation, Translation, Voice, Recommender, Play Go

https://mlperf.org/ �ALL Deep Learning but its MLPerf not DLPerf

2048 TPUs or

1536 V100’s with Infiniband quite powerful

Digital Science Center

Advances in High Performance Computing and Deep Learning

34 of 59

Science Data WG in MLPerf

Extends MLPerf to address Science Research Data
There is no existing scientific data benchmarking activity with a similar flavor to MLPerf -- namely addressing important realistic problems aiming at modern data analytics including deep learning on modern high-performance analysis systems.
Further, the challenges of science data benchmarking both benefit from the approach of MLPerf and will be synergistic with existing working groups.
Science like industry involves edge and data-center issues, end-to-end systems, inference, and training, There are some similarities in the datasets and analytics as both industry and science involve image data but also differences; science data associated with simulations and particle physics experiments are quite different from most industry exemplars.
Science datasets are often large and growing in size, while the multitude of active areas gives diverse challenges. The best practice science algorithms are shifting to deep learning approaches as in industry today.
Benchmarks will help more science fields take advantage of modern ML

Digital Science Center

Advances in High Performance Computing and Deep Learning

35 of 59

Science Data MLPerf working group

We foresee that scientific machine learning benchmarks for MLPerf will include a number of datasets, from each of the scientific domains, along with a representative problem from those domains.
When fully contributed, the benchmark suite will cover (at least) the following domains: material sciences, environmental sciences, life sciences, fusion, particle physics, astronomy, earthquake and earth sciences, with more than one representative problem from each of these domains

One aim is to provide a mechanism for assessing the capability of different ML models in addressing different scientific problem

Build tutorials around benchmarks

Digital Science Center

Advances in High Performance Computing and Deep Learning

36 of 59

Possible Initial MLPerf Science Benchmarks

SciML Suite of scientific machine learning benchmarks from Tony Hey, Jeyan Thiyagalingam: All data has been labelled; All open-source
Cloudmask: identify clouds at pixel level from multispectral satellite images; U-Net reference implementation; 1.8TB
EMNoise: increase signal to noise on electron microscope images; many DL/ML approaches but U-Net reference implementation provided; 5 GB
DiffuseScatter: analyze DMS (Diffuse Multiple Scattering) from materials irradiated by light sources to choose between 2 crystal structures; CNN reference implementation; 9 GB
This repository contains reference �implementations and command line tools �for easily configuring and running �the benchmarks.

https://github.com/stfc-sciml/sciml-benchmarks

Digital Science Center

Advances in High Performance Computing and Deep Learning

37 of 59

Cloud Masking

Given a set of satellite images, identify pixels that are of cloud
Pixel-wise classification: Cloud / Non-Cloud
Confusions: Sun-glint, fog, dust plume, snow, sea-ice, etc
No ground truth

Segmentation & Classification

Digital Science Center

Advances in High Performance Computing and Deep Learning

38 of 59

SBI: Surrogate Benchmark Initiative�FAIR Surrogate Benchmarks Supporting AI and Simulation Research

PI: Geoffrey Fox, IU

Replacing traditional HPC computations with Deep Learning surrogates can improve the performance of simulations and make optimal use of diverse architectures

Fitting of hardware to surrogates
Uncertainty Quantification of the surrogate estimates
Minimize Training Data Size needed to get reliable surrogates for a given accuracy choice.
Develop and test surrogate Performance Models

SBI collaborates with Industry and a leading machine learning benchmarking activity -- MLPerf

GOAL: Accelerate and better understand Deep Learning Surrogate models that can replace all or part or of traditional large-scale HPC computations with major performance increases.

Software Research: SBI will design and build general middleware to support the generation and the use of surrogates.

Findable, Accessible, Interoperable, and Reusable FAIR data ecosystem for HPC surrogates

Application Benefits: SBI will also make it easier for general users to develop new surrogates and help make their major performance increases pervasive across DoE computational science.

Digital Science Center

Advances in High Performance Computing and Deep Learning

TALKING POINTS:

Indiana University’s Geoffrey Fox leads the Surrogate Benchmark Initiative, or SBI, in collaboration with Argonne National Laboratory, Rutgers University, and the University of Tennessee, Knoxville.
Replacing traditional HPC computations with Deep Learning surrogates is one way to improve the performance of simulations and make optimal use of diverse architectures
This project will create a community repository of Findable, Accessible, Interoperable, and Reusable (FAIR) benchmark data needed to train the surrogates.
These benchmark data will encourage new advances in Artificial Intelligence and Machine Learning and will be an open resource for the application and computing systems communities.
This project will also develop a software framework for generating and using surrogates, which will advance our understanding of the relationships between data, models, hardware, performance, and uncertainty.
The overall project goal is to advance the understanding and use of simulation surrogates by supporting the community and developing the needed data software and architecture ecosystem

39 of 59

Technology for Benchmarking

Build on MLPerf Benchmarking Technology
MLBox access to Data and Model from Kubernetes, Jupyter and HPC
Build FAIR (Findable, Accessible, Interoperable and Reusable) Metadata on top of the work of Logging and Platform Working Group)
Incorporate this in demonstrations and tutorials

Digital Science Center

Advances in High Performance Computing and Deep Learning

40 of 59

Call for Action!

Develop Best Practice, Datasets, Tutorial material, Benchmarking Technology
Join MLPerf Science Data working group by joining MLPerf (automatic at web site https://groups.google.com/forum/#!forum/mlperf) and then request to join working groups at https://mlperf.org/get-involved/#join-working-groups

We would have a special relationship with the HPC MLPerf working group as much scientific data is analyzed by HPC systems.
We would have synergy with the Deep Learning for Time Series DeepTS MLPerf working group as many scientific datasets correspond to time series
Join these other working groups!

Digital Science Center

Advances in High Performance Computing and Deep Learning

41 of 59

Data Engineering and Deep Learning

Deep Learning Infrastructure with Cylon and Twister2

12/7/2019

DL Code

implies we need deep learning plus general data engineering

Digital Science Center

Advances in High Performance Computing and Deep Learning

42 of 59

Digital Science Center

Advances in High Performance Computing and Deep Learning

43 of 59

Deep Learning Workflow

Workflow often divide into two:

Data => Information preprocessing -- Hadoop, Spark, Twister2, Scikit-Learn

Information => Knowledge Compute intensive step Cylon enhanced Spark Twister2, PyTorch and Tensorflow

Digital Science Center

Advances in High Performance Computing and Deep Learning

44 of 59

Data Engineering

Data engineering is formulating structured data from raw data with ETL operations.
Data engineering enables Deep Learning(DL) and Machine Learning (ML) workflows.
Data engineering is used in;

Model Prototyping
Model @ Production

Digital Science Center

Advances in High Performance Computing and Deep Learning

45 of 59

Two Ecosystems

Enterprise: Java

Initial Data Engineering

Research Labs, Universities: Python

Final Deep Learning

GOAL: High Performance in each eco-system and high-performance integration between two ecosystems.

Digital Science Center

Advances in High Performance Computing and Deep Learning

46 of 59

Twister2

Big Data Processing�Eco-System

Dataflow

API

Dataflow

API

Twister2

Linear/Relational Algebra Operators

Distributed Linear/Relational Algebra Operators [C++]

Distributed Relational Communication Operations [C++]

Communication Kernels

Twister2 one of 5 possible engines for Apache Beam, which implements a rich data processing (dataflow) workflow environment – other engines are Spark, Flink, Samza, Google Cloud Dataflow

Cylon

Digital Science Center

Advances in High Performance Computing and Deep Learning

47 of 59

Twister2 Benchmarks with Big Data Processing Ecosystem

Digital Science Center

Advances in High Performance Computing and Deep Learning

48 of 59

High Performance Data Engineering

Most Data engineering frameworks are written on Java and not suitable for HPC environments.
Need good performance and scalability to match up with large scale AI workloads
Python is a favorite in data analytics

PyTorch, Tensorflow
Scikit Learn

Data Engineering with Python will be the best fit

PySpark
Pandas, Modin

Python provides higher productivity
High performance for Python can be enabled via kernels in C++

Cython
Pybind11 (Python bindings of existing C++ code).

Digital Science Center

Advances in High Performance Computing and Deep Learning

49 of 59

Cylon Architecture

Builds on Apache Arrow to��Link Python world:

Jupyter, Numpy, Pandas, Modin

with Java world�Spark, Twister2

and C++/CUDA world�High performance deep learning on PyTorch and Tensorflow

Digital Science Center

Advances in High Performance Computing and Deep Learning

50 of 59

Cylon: A High Performance Distributed Data Table

Cylon is a high performance C++ kernel and a distributed runtime for data pre-processing

Apache Parquet and Arrow based storage and in-memory data structure

Supports seamless integration with Deep Learning workloads, Pandas and Numpy
Zero-Copy data transfer between heterogeneous systems and languages.

Table API, an abstraction to ETL (extract, transform, load) for scientific computing and deep learning workloads

Join, Union, Intersect, Difference, Product, Project

Currently we support Joins (all formats) and �other components are currently in development.
Written in C++, APIs available in Java and Python.
Cylon is the high performance kernel of Twister2.

Digital Science Center

Advances in High Performance Computing and Deep Learning

51 of 59

Performance Comparison with Other Frameworks

Intel® Xeon® Platinum 8160 processors.
A node has a total RAM of 255GB and mounted SSD
InfiniBand with 40Gbps bandwidth
160 Processes across 10 Nodes
Inner Join
200M records per relation (left/right)

Digital Science Center

Advances in High Performance Computing and Deep Learning

52 of 59

Cylon Performance with Language Bindings

Intel® Xeon® Platinum 8160 processors.
A node has a total RAM of 255GB and mounted SSD
InfiniBand with 40Gbps bandwidth
Equal # Processes Across 8 Nodes
Inner Join
200M records per relation (left/right)

Digital Science Center

Advances in High Performance Computing and Deep Learning

53 of 59

Large Scale Experiments with PySpark and PyCylon

Intel® Xeon® Platinum 8160 processors.
A node has a total RAM of 255GB and mounted SSD
InfiniBand with 40Gbps bandwidth
200 Processes across 10 Nodes
Inner Join
10B records per relation (left/right)

Digital Science Center

Advances in High Performance Computing and Deep Learning

54 of 59

Jupyter Notebooks: Data Conversion and Usability

Seamlessly integrates with DL/ML frameworks via Numpy/Tensor abstraction
PyCylon provides endpoints to load data from existing data engineering libraries

Pandas
Numpy
PyArrow

Supports Jupyter Notebooks with distributed computation model

Digital Science Center

Advances in High Performance Computing and Deep Learning

55 of 59

Future Cylon Work

Improving Python eco-system

UCX integration for supporting a variety of communication kernels.
Integration with Dask to support as an execution backend
Extending Table API towards a DataFrame API ( Supporting Modin)
Rapids-CuDF integration

Improving integration among the “two ecosystems”

Accelerating Twister2 with Cylon Compute and Communication kernels.
Twister2 improvements will enhance the big data processing.

Increasing the application test suite.

Digital Science Center

Advances in High Performance Computing and Deep Learning

56 of 59

Sound Bites for Cylon

PyCylon is a fast and scalable backend
High performance compute kernels and communication kernels
Seamlessly integrates with existing data engineering libraries.
Seamlessly integrates with ML/DL frameworks.
Is a framework or a library
Rich data engineering APIs with flexibility and high performance
Supports high performance distributed data engineering kernels
As a library, improve performance of existing data engineering frameworks
High performance kernels for Modin and Pandas
Links efficiently to Java Big Data Processing eco-system
As a framework, provide a high-performance scalable kernels and APIs for application developers
Table-centric data engineering abstractions

Digital Science Center

Advances in High Performance Computing and Deep Learning

57 of 59

Conclusions

Twister2 ready to go
Big Data Systems
Parallel Computing
MLforHPC

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

58 of 59

Conclusions

Parallel Computing still healthy with Deep Learning, Surrogates giving new challenges
Opportunities in ML for HPC as well as HPC for ML
Switch to Deep Learning for Big Data is making good progress

Many new algorithms to be developed including for geospatial time series

Consider Science Research Benchmarks in MLPerf

Include surrogates
Use to produce tutorials

Enhance collaboration between Industry and Research; HPC and MLPerf/MLSys communities
Support common environments from Edge to Cloud with systematic attention to HPC
Some community confusion between ML v. DL and general data engineering and support of Deep Learning
Twister2/Cylon offer much of advantages of Spark with attention to both HPC and deep learning; streaming and batch; Java Python and C++

Digital Science Center

Advances in High Performance Computing and Deep Learning

59 of 59

Thank you!

External Collaborators at Argonne National Lab, Arizona State University, Kansas, Rutgers, Stony Brook, UT Knoxville, University of Virginia and MLPerf

Indiana University Digital Science Center:

Faculty: David Crandall, James Glazier, Vikram Jadhao, Judy Qiu and others

Staff: Josh Ballard, Gary Miksik, Fugang Wang, Chathura Widanage,

Researchers: Gurhan Gunduz, Supun Kamburugamuve, Ahmet Uyar, Gregor von Laszewski

Students: Vibhatha Abeykoon, Bo Feng, JCS Kadupitiya, Niranda Perera, Pulasthi Wickramasinghe,

Digital Science Center

Advances in High Performance Computing and Deep Learning