1 of 59

Advances in High Performance Computing and Deep Learning: Data Engineering and Data Science

1

Geoffrey Fox

Digital Science Center, Indiana University, gcf@indiana.edu,

Digital Science Center

Advances in High Performance Computing and Deep Learning

2 of 59

Interesting Changes in Fields and Communities

  • At beginning of parallel computing, hardware, software and algorithms were all vibrant areas
    • and leading areas of computer science research -- everybody had transputers or iPSC2’s
  • After a while, hardware became hard in universities due to infrastructure needed
  • Later, (parallel) algorithms were so successful that perhaps interest waned
  • High Performance computing HPC thrived due to NSF/DoE/Europe/Japan/China national networks and success of computational science algorithms
    • Judge by citations, conference attendance, Industry jobs, Faculty jobs
  • Big Data and Deep Learning changed nearly everything
  • HPC has become essential infrastructure; more engineering than research and less academic opportunities (should be more). Industry and DoE in USA have jobs.
  • Algorithms are thriving in form of deep learning DL including simulation surrogates
    • Other forms of Machine Learning in decline and replaced by DL
  • Big data problems are not as parallel as simulations as typically modest size parallel machines can cope (combined with large number of independent or loosely coupled jobs
  • Software as always active although top areas changing

2

Digital Science Center

Advances in High Performance Computing and Deep Learning

3 of 59

Remarks on Convergence Big Data, Simulation, HPC

  • HPC integration with both Big Data and (Big) Simulation has one clear aspect as both areas need high performance reached by HPC technology
    • Clear in exascale initiative and success of computational science and GPU’s for Deep Learning
    • Part of Systems for ML in Dean’s talk at NeurIPS 2017 for Big Data
    • We term HPCforML and reasonably clear how to support in hardware and software as seen in DoE systems
    • Details uncertain partly due to uncertainties of Industry directions, HPC Clouds v. supercomputers etc.
  • We can also integrate Big Data technology (from Apache Software Stack to Containers) with simulations
  • Dean also discussed use of machine learning to enhance Systems which becomes MLforHPC when system built on HPC technology
    • Actually involves applications and not just the system
  • Even broad principles for MLforHPC software and hardware support unclear at this early stage

3

Digital Science Center

Advances in High Performance Computing and Deep Learning

4 of 59

HPCforML: Similar Challenges in Parallelism for Big Data and Simulation �Complexity of Synchronization and Parallellization

4

Pleasingly Parallel

Often independent events

MapReduce as in scalable databases

Structured Adaptive Sparse

Regular simulations

Current major Big Data category

Commodity Clouds

HPC Clouds: Accelerators

High Performance Interconnect

Global Machine Learning

e.g. parallel clustering

Deep Learning

HPC Clouds/Supercomputers

Memory access also critical

Unstructured Adaptive Sparse

Graph Analytics e.g. subgraph mining

LDA

Linear Algebra at core �(often not sparse)

Straightforward Parallelism

Parameter sweep simulations

Loosely Coupled

Complex Coupling

Regular Coupling

User Performed Parallelism

Increasing�Data

Simulations

Digital Science Center

Advances in High Performance Computing and Deep Learning

5 of 59

5

ML Code

NIPS 2015 http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

This well-known paper points out that parallel high-performance machine learning is perhaps most fun but just a part of system. We need to integrate in the other data and orchestration components.

This integration is not very good or easy partly because data management systems like Spark are JVM-based which doesn’t cleanly link to C++, Python world of high-performance ML

Twister2, Cylon at IU address

HPCforML: Integration Challenges

Digital Science Center

Advances in High Performance Computing and Deep Learning

6 of 59

High Performance Computing and Deep Learning

  • ML for HPC
  • Surrogates

6

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

7 of 59

Let’s look at ML for HPC

  • Traditionally we did HPC for ML -- run deep learning on a GPU BUT ML for HPC is probably more interesting
  • Currently in science the ML is used to enhance simulations (not data analytics) and dominantly the ML used is Deep Learning
  • Introduced 3 major categories and 8 subcategories
    1. Improving Simulation with ML-controlled Configurations and Integration of Data
    2. Use ML to Learn Structure, Theory and Model for Simulation
    3. Use ML to Learn Surrogates for Simulation
  • Work with Shantenu Jha

7

Digital Science Center

Advances in High Performance Computing and Deep Learning

8 of 59

8

1.1 MLAutotuningHPC – Learn configurations

2.1 MLAutotuningHPC – Smart ensembles

1.2 MLAutotuningHPC – Learn models from data

3.2 MLaroundHPC: Learning Outputs from Inputs (fields)

3.1 MLaroundHPC: Learning Outputs from Inputs (parameters)

2.2 MLaroundHPC: Learning Model Details (coarse graining, effective potentials)

1.3 MLaroundHPC: Learning Model Details (ML based data assimilation)

2.3 MLaroundHPC: Improve Model or Theory

INPUT

OUTPUT

1. Improving Simulation with Configurations and Integration of Data

2. Learn Structure, Theory and Model for Simulation

3. Learn Surrogates for Simulation

Digital Science Center

Advances in High Performance Computing and Deep Learning

9 of 59

Examples of ML for HPC (work with JCS Kadupitiya, Vikram Jadhao)

  • Uses quite small Multi-layer perceptron MLP to predict 150 observables from 5 input parameters (~5000 in training set)
  • MLP outperforms other ML choices
  • Deployed on nanoHUB for education (an attractive use of surrogates so students get answers fast)
  • General Electric uses similar approach to give interactive Engine design options (200 in training set)

9

→ 106 as Nlookup → ∞

The Learning Net

Direct simulation compared to Surrogates

  • Extraction of ionic structure in electrolyte solutions confined by planar and spherical surfaces.
  • Classic HPC code written with C++ and accelerated with hybrid MPI-OpenMP.

Digital Science Center

Advances in High Performance Computing and Deep Learning

10 of 59

Up to two billion times acceleration of scientific simulations with deep neural architecture search

  • January 23 2020 https://arxiv.org/pdf/2001.08055.pdf
  • 10 scientific cases including astrophysics, climate science, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters.
  • Approach also dynamically choses deep network and provides uncertainty estimation, adding further confidence in their use.

10

Digital Science Center

Advances in High Performance Computing and Deep Learning

11 of 59

INSILICO MEDICINE USED CREATIVE AI TO DESIGN POTENTIAL DRUGS IN JUST 21 DAYS

  • Map Drug (Material) Structure to Drug (Material) Properties
  • Hong Kong-based Insilico Medicine sent shockwaves through the pharma industry after publishing research in Nature Biotechnology that proves its AI-powered drug discovery system was capable of producing at least one potential treatment for fibrosis in less than a month's time.
  • The system uses a Deep Reinforcement Learning algorithm that can imagine potential protein structures based on existing research and certain preprogrammed design criteria.
  • Insilico's system initially produced 30,000 possible designs, which the research team whittled down to six that were synthesized in the lab, with one design eventually tested on mice to promising results.
  • Insilico's AI-powered research process could offer a massive push forward for the pharmaceutical industry, which faces increasingly high drug development costs. In just a handful of weeks and for approximately $150,000, Insilico delivered what typically takes pharmaceutical companies $2.6 billion over seven years.

11

September 4 2019 News Item

Digital Science Center

Advances in High Performance Computing and Deep Learning

12 of 59

Operator Formulation of Deep Learning Inference

  • Suppose we are solving PDE’s or sets of coupled ODE’s
  • Typically we solve iteratively New Values = (Differential Operator) Previous Values
  • Classic applied math tells you nifty difference equations and spectral methods to represent Operator numerically
  • Deep Learning learns the operator from classic numerics or observational data or their combination
  • Inference is New Values = (DL Operator) Previous Values
  • This new nonlinear trained DL operator can allow much larger time steps, incorporate variations in parameters, learn potentials etc.
  • DL Operator is the new theory (Newton’s laws) of science
  • High order approximations are traditionally very sensitive to noise and one was taught to avoid but Deep NNs are the opposite – both verbose and robust
    • See DL operator and multiple LSTM layers with 100s-100,000 parameters
    • Newton’s laws for this have 2-4 parameters

12

Digital Science Center

Advances in High Performance Computing and Deep Learning

13 of 59

Learn Newton’s laws with Recurrent Neural Networks

  • Deep Learning is revolutionizing (spatial) Time series Analysis
  • Good example is integrating sets of differential equations
  • Train the network on traditional 5 time step series from (Verlet) difference equations
  • Verlet needs time step .001 for reliable integration but
  • Learnt LSTM network is reliable for time steps which

13

are 4000 times longer � and also learn potential.

  • Speedup is 30000 on 16 particles interacting with Lennard-Jones potentials
  • 2 layer-64 units per layer LSTM network: 65,072 trainable parameters
  • 5000 training simulations

RNN Error2 up to step size dT=4 and total time 106

Verlet error2�dT = 0.01, 0.1

10-5

1023

101

JCS Kadupitiya, Vikram Jadhao

Digital Science Center

Advances in High Performance Computing and Deep Learning

14 of 59

Results on different potentials (one particle)

  • Simple harmonic Oscillator:
    • Hooke's law in VV
    • T=100, ∆TMD = 0.001
    • RuntimeMD = 1.8 sec
    • Mass(m) and spring constant(k) varied: 500 initial configs
  • Lennard-Jones:
    • Lennard-Jones (LJ) in VV
    • T=100, ∆TMD = 0.001
    • RuntimeMD = 2.7 sec
    • Mass(m) and initial position (x0) varied: 500 initial configs
  • Double Well:
    • [U(X)=X4/4 - X2/2] in VV
    • T=100, ∆TMD = 0.001
    • RuntimeMD = 1.9 sec
    • Mass(m) and initial position (x0) varied: 500 initial configs

14

Classic Simulation Error

Digital Science Center

Advances in High Performance Computing and Deep Learning

15 of 59

Multiple versions of MLforHPC used in�Simulating Biological Organisms (with James Glazier @IU)

15

Learning Model (Agent) BehaviorReplace components by learned surrogates �(Reaction Kinetics Coupled ODE’s)

Dynamic Data Assimilation

Theory to Instance

Smart Ensembles

All steps use MLAutotuning

Digital Science Center

Advances in High Performance Computing and Deep Learning

16 of 59

Futures of ML for HPC

  • ML for HPC broadly but current use is nonuniform across domains
  • Use of modest DL network to map material/potential drug structure to properties (generalized QSAR) with simulation and observation: Advanced Progress
  • Learn surrogates for large scale simulations: good results with major speedup
  • Use of MLforHPC in agent-based systems (learn agents replacing by surrogates): Very promising but few results
    • Use in Sociotechnical simulations and in virtual tissues (agents are people or cells)
  • Macroscopic Structure as in learn complex multi-particle potentials scaling to N7: many great successes
  • Learn Collective coordinates and guide ensemble computations: dramatic progress with speedups up to 108
  • Microscale; learn dynamics of small scale such as clouds, turbulence: Interesting results but much more to do
  • Use of Recurrent NN’s to represent dynamics (learn numerical differential operators): Promising but only studied in small problems
  • Learn errors as well as values in differential equation solutions
  • Minimize number of expensive HPC simulations -- 2 Billion paper intriguing

16

Digital Science Center

Advances in High Performance Computing and Deep Learning

17 of 59

Looking at Covid Distributions

  • Work with Public Health Department at Pittsburgh
  • Gregor von Laszeswki

17

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

18 of 59

Times Series represented by Deep Learning

  • Molecular Dynamics Solution is “just” a time series and we saw DL derived an operator to describe this
    • Classic applied math also gives a numeric operator based on Newton’s Laws
  • Time Series or rather sequences have two important cases
  • Forecast the future of time series observables
    • Solve Differential Equations
    • Ride Hailing and eCommerce delivery
    • Earthquakes
    • Environmental Science
    • Spread of COVID
  • Predict new observables in time interval of time series - sequence to sequence map
    • Natural Language Processing: Turkish to English
    • Scheduling Clouds or HPC Systems

18

Digital Science Center

Advances in High Performance Computing and Deep Learning

19 of 59

Basic Spatial (bag) Time Series

19

Forecast the Future �(any number of time units

any number properties)

Predict Now

or Seq2Seq map

as in

English to French

or

rainfall to runoff

Input Properties

Static e.g. %Seniors

Dynamic e.g. Covid cases per day

Space x

(Different data sources, not necessarily nearby)

Forecast the Future

Time t

Seq 2 Seq

Data Analysis Unit

Time sequence at one space point

For Natural Language Processing, space points are different paragraphs or books. A few sentences at each point. Earthquake points are nearby

Digital Science Center

Advances in High Performance Computing and Deep Learning

20 of 59

General Deep Learning Strategies

  • Attention means that you “learn” from other related data (the past)
  • Transformer uses matrix features by comparing structure (scalar product) across other time and space points
  • LSTM uses history passed through time sequence
  • Just a few studies of Transformer for forecasting problem
  • Current implementation looks at full attention or merges attention from space and time separately but a lot of research needed here for both training and inference

20

Hybrid Transformer (for encoder) and LSTM (for decoder)

Pure LSTM

Two different models used

Q(from i) KT(from j) V(j)

added over j

is attention for i

Q K V are dense layers on input i.e linear combinations plus activations on inputs

Merge

Final

Initial

LSTM Layer

LSTM Layer

Outputs

Final

Initial

LSTM Layer

LSTM Layer

Outputs

Inputs

optional

Digital Science Center

Advances in High Performance Computing and Deep Learning

21 of 59

Pure LSTM description of 205 day 314 City Data

21

SQRT(N) Summed over cities from fit to individual times/cities for daily data with 2 week prediction

Red is error

Digital Science Center

Advances in High Performance Computing and Deep Learning

22 of 59

Hybrid Transformer with intrinsic error for 314 cities/counties and 159 days ( 7.11 secs/epoch)

22

Digital Science Center

Advances in High Performance Computing and Deep Learning

23 of 59

Hybrid Transformer with larger intrinsic error for 110 cities/counties and 115 days

23

Digital Science Center

Advances in High Performance Computing and Deep Learning

24 of 59

Particular Regions

from Hybrid Transformer

24

New York City

Chicago (Cook County)

Digital Science Center

Advances in High Performance Computing and Deep Learning

25 of 59

Particular Regions

from Hybrid Transformer

25

Seattle (King)

Los Angeles

Digital Science Center

Advances in High Performance Computing and Deep Learning

26 of 59

Comments I

  • Google Colab Pro with 3300 lines of Python
    • Tensorflow with custom training; 5-10 seconds per epoch with GPU
  • We used positional encoding of both input and output
    • NLP Transformer uses a related but different mechanism -- need to compare
  • Weekly structure motivates cosθ, sinθ as input and output time series where runs from 0 to 2π over 7 days.
    • Could use a similar strategy for other time periods (e.g. annual environment data, daily traffic patterns)
    • Also used linear indicator for space and time
  • Encoding used in both LSTM and hybrid model and clearly improves fit
  • Inputs: Daily Cases, Fatalities; 12-30 static properties; 4 encodings
    • Daily social distancing to 314 county dataset
  • Predictions: Cases Fatalities for next day and 14 days in future; 4 encodings (weighted in loss function); total 34
    • Implies missing data ignored in custom loss function

26

Digital Science Center

Advances in High Performance Computing and Deep Learning

27 of 59

Comments II

  • Two datasets
    • 110 Counties with 115 days (till May 25) and 30 static properties such as %seniors, %hispanic, asthma measure, number of beds, etc.
    • 314 Counties with 205 days (till August 13) and 12 refined static properties
  • Results shown summed over cities or for individual locations
  • Model fitted to square root of daily cases/deaths as this agrees with a MSE loss function with uniform errors (for counting errors √N has error O(1))
  • Window sizes 5-13 all look quite good; today sliding window of size 9
  • Transformer has non trivial issue in calculating total attention as each sequence can match to 553895 other points
    • Calculate by Monte Carlo over attention points and groupings
    • Fit uses groups of 314 by 9 sequences randomized over space and time
  • Inference looks at further choices but larger dataset shows little error from Monte Carlo
  • Current results use 4 attention heads

27

Digital Science Center

Advances in High Performance Computing and Deep Learning

28 of 59

Collection of Time Series Machine Learning Algorithms (MLPerf)

28

Areas

Applications

Model

Data sets

Papers

Cars, Taxis, Freeway Detectors

TT-RNN , BNN, LSTM

[6-8]

Wearables, Medical instruments: EEG, ECG, ERP, Patient Data

LSTM, RNN

OPPORTUNITY [9-10],

EEG [11-14], MIMIC [15]

[16-20]

Intrusion, classify traffic, anomaly detection

LSTM

[21, 23-25]

Household electric use, Economic, Finance, Demographics, Industry

CNN, RNN

[28-29]

Stock Prices versus time

CNN, RNN

Available academically from Wharton [30]

[31]

Climate, Tokamak

Markov, RNN

[33-35]

Events

LSTM

[36-37]

Language and

Translation

Pre-trained Data

Transformer [38]

[39-40]

[41-42]Mesh Tensorflow

All-Neural On-Device Speech Recognizer

RNN-T

[43]

IndyCar Racing

Real-time car and track detectors

HTM, LSTM

[44]

Twitter

Online Clustering

Available from Twitter

[45-46]

Xinyuan Huang from Cisco and MLPerf

Digital Science Center

Advances in High Performance Computing and Deep Learning

29 of 59

Lots of Applications and DL Opportunities: Hydrology

  • Essentially all application areas are in exploratory stage and many only modest use of deep learning
  • Hydrology has an impressive 129 deep learning papers studying issues such as rainfall and runoff and relation to terrain
  • I studied a few of these ways and believe although sound and interesting, none of them use the optimal choices of network and hyperparameters
  • We should get examples and challenge community to improve as well as looking at other communities
  • We propose combining with benchmarking activity and industry MLPerf

29

https://eartharxiv.org/xs36g/

Digital Science Center

Advances in High Performance Computing and Deep Learning

30 of 59

Time Series and Operators for Earthquakes

  • Earthquakes have a solid physics theory for the movement of the earth which is not very useful as depends on largely unknown data on friction laws and fault structures
    • We observe this theory and controlling data (the hidden variables) through observations of earthquakes over the years
  • The earthquake operator predicts probabilities of earthquakes of different magnitudes over different time intervals
    • As faults/data vary this operator should be “nearest neighbor” spatial (2D)
  • Think of as an image with pixels holding log(total energy) in a time interval and use convLSTM

30

Predicted True

From Test Set: Predicted True

Digital Science Center

Advances in High Performance Computing and Deep Learning

31 of 59

High Performance Computing and Deep Learning Benchmarks

31

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

32 of 59

MLPerf Consortium Deep Learning Benchmarks

Some Relevant Working Groups

  • Training
  • Inference (Batch and Streaming)
  • TinyML (embedded)
  • Deep Learning for Time Series
  • Power
  • Datasets
  • HPC (DoE Labs)
  • Research
  • Science Data �(Just Approved)

MLPerf's mission is to build fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services.

Benchmark what user sees

Used for purchasing decisions worth >$1B USD, rapidly growing market (ML chipset market in 2025 is ~$60B)

Total 50 FTE

32

  • Accelerate progress in ML via fair and useful measurement
  • Serve both the commercial and research communities
  • Enable fair comparison of competing systems yet encourage innovation to improve the state-of-the-art of ML
  • Enforce replicability to ensure reliable results
  • Keep benchmarking effort affordable so all can participate

73 Companies; 10 universities

Digital Science Center

Advances in High Performance Computing and Deep Learning

33 of 59

MLPerf Industry Machine Learning Site

Training v0.7

33

Images, Images, Images, Translation, Translation, Voice, Recommender, Play Go

https://mlperf.org/ �ALL Deep Learning but its MLPerf not DLPerf

2048 TPUs or

1536 V100’s with Infiniband quite powerful

Digital Science Center

Advances in High Performance Computing and Deep Learning

34 of 59

Science Data WG in MLPerf

  • Extends MLPerf to address Science Research Data
  • There is no existing scientific data benchmarking activity with a similar flavor to MLPerf -- namely addressing important realistic problems aiming at modern data analytics including deep learning on modern high-performance analysis systems.
  • Further, the challenges of science data benchmarking both benefit from the approach of MLPerf and will be synergistic with existing working groups.
  • Science like industry involves edge and data-center issues, end-to-end systems, inference, and training, There are some similarities in the datasets and analytics as both industry and science involve image data but also differences; science data associated with simulations and particle physics experiments are quite different from most industry exemplars.
  • Science datasets are often large and growing in size, while the multitude of active areas gives diverse challenges. The best practice science algorithms are shifting to deep learning approaches as in industry today.
  • Benchmarks will help more science fields take advantage of modern ML

34

Digital Science Center

Advances in High Performance Computing and Deep Learning

35 of 59

Science Data MLPerf working group

  • We foresee that scientific machine learning benchmarks for MLPerf will include a number of datasets, from each of the scientific domains, along with a representative problem from those domains.
  • When fully contributed, the benchmark suite will cover (at least) the following domains: material sciences, environmental sciences, life sciences, fusion, particle physics, astronomy, earthquake and earth sciences, with more than one representative problem from each of these domains

35

One aim is to provide a mechanism for assessing the capability of different ML models in addressing different scientific problem

Build tutorials around benchmarks

Digital Science Center

Advances in High Performance Computing and Deep Learning

36 of 59

Possible Initial MLPerf Science Benchmarks

  • SciML Suite of scientific machine learning benchmarks from Tony Hey, Jeyan Thiyagalingam: All data has been labelled; All open-source
  • Cloudmask: identify clouds at pixel level from multispectral satellite images; U-Net reference implementation; 1.8TB
  • EMNoise: increase signal to noise on electron microscope images; many DL/ML approaches but U-Net reference implementation provided; 5 GB
  • DiffuseScatter: analyze DMS (Diffuse Multiple Scattering) from materials irradiated by light sources to choose between 2 crystal structures; CNN reference implementation; 9 GB
  • This repository contains reference �implementations and command line tools �for easily configuring and running �the benchmarks.

36

https://github.com/stfc-sciml/sciml-benchmarks

Digital Science Center

Advances in High Performance Computing and Deep Learning

37 of 59

Cloud Masking

  • Given a set of satellite images, identify pixels that are of cloud
  • Pixel-wise classification: Cloud / Non-Cloud
  • Confusions: Sun-glint, fog, dust plume, snow, sea-ice, etc
  • No ground truth

Segmentation & Classification

Digital Science Center

Advances in High Performance Computing and Deep Learning

38 of 59

SBI: Surrogate Benchmark Initiative�FAIR Surrogate Benchmarks Supporting AI and Simulation Research

PI: Geoffrey Fox, IU

Replacing traditional HPC computations with Deep Learning surrogates can improve the performance of simulations and make optimal use of diverse architectures

  • Fitting of hardware to surrogates
  • Uncertainty Quantification of the surrogate estimates
  • Minimize Training Data Size needed to get reliable surrogates for a given accuracy choice.
  • Develop and test surrogate Performance Models

SBI collaborates with Industry and a leading machine learning benchmarking activity -- MLPerf

GOAL: Accelerate and better understand Deep Learning Surrogate models that can replace all or part or of traditional large-scale HPC computations with major performance increases.

Software Research: SBI will design and build general middleware to support the generation and the use of surrogates.

Findable, Accessible, Interoperable, and Reusable FAIR data ecosystem for HPC surrogates

Application Benefits: SBI will also make it easier for general users to develop new surrogates and help make their major performance increases pervasive across DoE computational science.

Digital Science Center

Advances in High Performance Computing and Deep Learning

39 of 59

Technology for Benchmarking

  • Build on MLPerf Benchmarking Technology
  • MLBox access to Data and Model from Kubernetes, Jupyter and HPC
  • Build FAIR (Findable, Accessible, Interoperable and Reusable) Metadata on top of the work of Logging and Platform Working Group)
  • Incorporate this in demonstrations and tutorials

39

Digital Science Center

Advances in High Performance Computing and Deep Learning

40 of 59

Call for Action!

  • Develop Best Practice, Datasets, Tutorial material, Benchmarking Technology
  • Join MLPerf Science Data working group by joining MLPerf (automatic at web site https://groups.google.com/forum/#!forum/mlperf) and then request to join working groups at https://mlperf.org/get-involved/#join-working-groups
    • We would have a special relationship with the HPC MLPerf working group as much scientific data is analyzed by HPC systems.
    • We would have synergy with the Deep Learning for Time Series DeepTS MLPerf working group as many scientific datasets correspond to time series
    • Join these other working groups!

40

Digital Science Center

Advances in High Performance Computing and Deep Learning

41 of 59

Data Engineering and Deep Learning

Deep Learning Infrastructure with Cylon and Twister2

41

12/7/2019

DL Code

implies we need deep learning plus general data engineering

Digital Science Center

Advances in High Performance Computing and Deep Learning

42 of 59

Digital Science Center

Advances in High Performance Computing and Deep Learning

43 of 59

Deep Learning Workflow

43

Workflow often divide into two:

Data => Information preprocessing -- Hadoop, Spark, Twister2, Scikit-Learn

Information => Knowledge Compute intensive step Cylon enhanced Spark Twister2, PyTorch and Tensorflow

Digital Science Center

Advances in High Performance Computing and Deep Learning

44 of 59

Data Engineering

  • Data engineering is formulating structured data from raw data with ETL operations.
  • Data engineering enables Deep Learning(DL) and Machine Learning (ML) workflows.
  • Data engineering is used in;
    • Model Prototyping
    • Model @ Production

Digital Science Center

Advances in High Performance Computing and Deep Learning

45 of 59

Two Ecosystems

Enterprise: Java

Initial Data Engineering

Research Labs, Universities: Python

Final Deep Learning

GOAL: High Performance in each eco-system and high-performance integration between two ecosystems.

Digital Science Center

Advances in High Performance Computing and Deep Learning

46 of 59

Twister2

Big Data Processing�Eco-System

Dataflow

API

Dataflow

API

Twister2

Linear/Relational Algebra Operators

Distributed Linear/Relational Algebra Operators [C++]

Distributed Relational Communication Operations [C++]

Communication Kernels

Twister2 one of 5 possible engines for Apache Beam, which implements a rich data processing (dataflow) workflow environment – other engines are Spark, Flink, Samza, Google Cloud Dataflow

Cylon

Digital Science Center

Advances in High Performance Computing and Deep Learning

47 of 59

Twister2 Benchmarks with Big Data Processing Ecosystem

Digital Science Center

Advances in High Performance Computing and Deep Learning

48 of 59

High Performance Data Engineering

  • Most Data engineering frameworks are written on Java and not suitable for HPC environments.
  • Need good performance and scalability to match up with large scale AI workloads
  • Python is a favorite in data analytics
    • PyTorch, Tensorflow
    • Scikit Learn
  • Data Engineering with Python will be the best fit
    • PySpark
    • Pandas, Modin
  • Python provides higher productivity
  • High performance for Python can be enabled via kernels in C++
    • Cython
    • Pybind11 (Python bindings of existing C++ code).

Digital Science Center

Advances in High Performance Computing and Deep Learning

49 of 59

Cylon Architecture

Builds on Apache Arrow to��Link Python world:

Jupyter, Numpy, Pandas, Modin

with Java world�Spark, Twister2

and C++/CUDA world�High performance deep learning on PyTorch and Tensorflow

Digital Science Center

Advances in High Performance Computing and Deep Learning

50 of 59

Cylon: A High Performance Distributed Data Table

  • Cylon is a high performance C++ kernel and a distributed runtime for data pre-processing
    • Apache Parquet and Arrow based storage and in-memory data structure
      • Supports seamless integration with Deep Learning workloads, Pandas and Numpy
      • Zero-Copy data transfer between heterogeneous systems and languages.
  • Table API, an abstraction to ETL (extract, transform, load) for scientific computing and deep learning workloads
    • Join, Union, Intersect, Difference, Product, Project
  • Currently we support Joins (all formats) and �other components are currently in development.
  • Written in C++, APIs available in Java and Python.
  • Cylon is the high performance kernel of Twister2.

50

Digital Science Center

Advances in High Performance Computing and Deep Learning

51 of 59

Performance Comparison with Other Frameworks

  • Intel® Xeon® Platinum 8160 processors.
  • A node has a total RAM of 255GB and mounted SSD
  • InfiniBand with 40Gbps bandwidth
  • 160 Processes across 10 Nodes
  • Inner Join
  • 200M records per relation (left/right)

Digital Science Center

Advances in High Performance Computing and Deep Learning

52 of 59

Cylon Performance with Language Bindings

  • Intel® Xeon® Platinum 8160 processors.
  • A node has a total RAM of 255GB and mounted SSD
  • InfiniBand with 40Gbps bandwidth
  • Equal # Processes Across 8 Nodes
  • Inner Join
  • 200M records per relation (left/right)

Digital Science Center

Advances in High Performance Computing and Deep Learning

53 of 59

Large Scale Experiments with PySpark and PyCylon

  • Intel® Xeon® Platinum 8160 processors.
  • A node has a total RAM of 255GB and mounted SSD
  • InfiniBand with 40Gbps bandwidth
  • 200 Processes across 10 Nodes
  • Inner Join
  • 10B records per relation (left/right)

Digital Science Center

Advances in High Performance Computing and Deep Learning

54 of 59

Jupyter Notebooks: Data Conversion and Usability

  • Seamlessly integrates with DL/ML frameworks via Numpy/Tensor abstraction
  • PyCylon provides endpoints to load data from existing data engineering libraries
    • Pandas
    • Numpy
    • PyArrow
  • Supports Jupyter Notebooks with distributed computation model

Digital Science Center

Advances in High Performance Computing and Deep Learning

55 of 59

Future Cylon Work

  • Improving Python eco-system
    • UCX integration for supporting a variety of communication kernels.
    • Integration with Dask to support as an execution backend
    • Extending Table API towards a DataFrame API ( Supporting Modin)
    • Rapids-CuDF integration
  • Improving integration among the “two ecosystems”
    • Accelerating Twister2 with Cylon Compute and Communication kernels.
    • Twister2 improvements will enhance the big data processing.
  • Increasing the application test suite.

Digital Science Center

Advances in High Performance Computing and Deep Learning

56 of 59

Sound Bites for Cylon

  • PyCylon is a fast and scalable backend
  • High performance compute kernels and communication kernels
  • Seamlessly integrates with existing data engineering libraries.
  • Seamlessly integrates with ML/DL frameworks.
  • Is a framework or a library
  • Rich data engineering APIs with flexibility and high performance
  • Supports high performance distributed data engineering kernels
  • As a library, improve performance of existing data engineering frameworks
  • High performance kernels for Modin and Pandas
  • Links efficiently to Java Big Data Processing eco-system
  • As a framework, provide a high-performance scalable kernels and APIs for application developers
  • Table-centric data engineering abstractions

Digital Science Center

Advances in High Performance Computing and Deep Learning

57 of 59

Conclusions

  • Twister2 ready to go
  • Big Data Systems
  • Parallel Computing
  • MLforHPC

57

12/7/2019

Digital Science Center

Advances in High Performance Computing and Deep Learning

58 of 59

Conclusions

  • Parallel Computing still healthy with Deep Learning, Surrogates giving new challenges
  • Opportunities in ML for HPC as well as HPC for ML
  • Switch to Deep Learning for Big Data is making good progress
    • Many new algorithms to be developed including for geospatial time series
  • Consider Science Research Benchmarks in MLPerf
    • Include surrogates
    • Use to produce tutorials
  • Enhance collaboration between Industry and Research; HPC and MLPerf/MLSys communities
  • Support common environments from Edge to Cloud with systematic attention to HPC
  • Some community confusion between ML v. DL and general data engineering and support of Deep Learning
  • Twister2/Cylon offer much of advantages of Spark with attention to both HPC and deep learning; streaming and batch; Java Python and C++

58

Digital Science Center

Advances in High Performance Computing and Deep Learning

59 of 59

Thank you!

External Collaborators at Argonne National Lab, Arizona State University, Kansas, Rutgers, Stony Brook, UT Knoxville, University of Virginia and MLPerf

Indiana University Digital Science Center:

Faculty: David Crandall, James Glazier, Vikram Jadhao, Judy Qiu and others

Staff: Josh Ballard, Gary Miksik, Fugang Wang, Chathura Widanage,

Researchers: Gurhan Gunduz, Supun Kamburugamuve, Ahmet Uyar, Gregor von Laszewski

Students: Vibhatha Abeykoon, Bo Feng, JCS Kadupitiya, Niranda Perera, Pulasthi Wickramasinghe,

59

Digital Science Center

Advances in High Performance Computing and Deep Learning