Perspectives on High-Performance Computing in a Big Data World

Perspectives on High-Performance Computing in a Big Data World

The 28th International Symposium on High-Performance Parallel and Distributed Computing

gcf@indiana.edu, http://www.dsc.soic.indiana.edu/, http://spidal.org/

ACM HPDC 2019, Phoenix, Arizona, USA

Geoffrey Fox | June 27, 2019

Digital Science Center

Outline

  • Discussion of Communities from HPC to Edge to Big Data to Machine Learning and Cloud
  • Aligning with Industry in an AI First world
  • “Machine Learning for Systems” or MLforHPC
    • Scenarios
    • Examples
    • Some open questions
  • Conclusions

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

2

Digital Science Center

Data on the Evolution of Interests and Communities

3

Digital Science Center

Qian Depei: The difference between AI and computer architecture academic research is hot

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

4

Number of papers published 4960 vs 243

H5index 45 158 91 56 89 101 98 46 54 41 50

Digital Science Center

Papers Submitted: Comparing 4 Conference Types

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

5

SumCI: SC, eScience, CCGrid, IPDPS

SumCloud: IEEE Cloud, Cloudcom

Digital Science Center

Attendance at Major AI Conferences

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

6

Digital Science Center

Papers Submitted at 7 Conferences

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

7

39

28

43

47

37

<12

25

h5index

Digital Science Center

H5-index Conferences: AI, Big Data, Cloud, Systems, Other

158 CVPR: Conference on Computer Vision and Pattern Recognition

101 NeurIPS: Neural Information Processing Systems

98 ECCV: European Conference on Computer Vision

91 ICML: International Conference on Machine Learning

89 ICCV: International Conference on Computer Vision

85 CHI: Computer Human Interaction

80 INFOCOM: Joint Conference of the Computer and Communications Societies

77 WWW:  International World Wide Web Conferences

73 VLDB: International Conference on Very Large Databases

73 SIGKDD: International Conference on Knowledge discovery and data mining

71 ICRA: International Conference on Robotics and Automation

56 AAAI: Assoc. Adv. AI Conference on Artificial Intelligence

54 ISCA: International Symposium on Computer Architecture

50 IROS: International Conference on Intelligent Robots and Systems

50 ASPLOS: International Conference on Architectural Support for Programming Languages and Operating Systems

47 SC: International Conference on High Performance Computing, Networking, Storage and Analysis

46 HPCA: International Symposium on High Performance Computer Architecture

45 IJCAI: International Joint Conference on Artificial Intelligence

43 BMVC: British Machine Vision Conference

43 IPDPS: International Symposium on Parallel & Distributed Processing

41 MICRO: International Symposium on Microarchitecture

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

8

39 CLOUD: International Conference on Cloud Computing

39 OSDI: Symposium on Operating Systems Design and Implementation

37 OOPSLA: SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

37 PPOPP: SIGPLAN Symposium on Principles & Practice of Parallel Programming

37 CCGrid: International Symposium on Cluster Computing and the Grid

34 ICIP: International Conference on Image Processing

34 ICPR: International Conference on Pattern Recognition

30 SoCC: Symposium on Cloud Computing

29 ECAI: European Conference on Artificial Intelligence

29 HPDC: International Symposium on High Performance Distributed Computing

28 CloudCom: International Conference on Cloud Computing Technology and Science

26 ICS: International Conference on Supercomputing

25 Big Data: International Conference on Big Data

21 CLUSTER: International Conference on Cluster Computing

21 SPAA: Symposium on Parallelism in Algorithms and Architectures

20 ICPP: International Conference on Parallel Processing

18 ICCSA: International Conference on Computational Science and Its Applications

15 SBAC-PAD: International Symposium on Computer Architecture and High Performance Computing

14 DS-RT: International Symposium on Distributed Simulation and Real-Time Applications

Some didn’t make h5-index cut of >=12

Digital Science Center

Some large areas: Google Trends last 5 years (Topics unless otherwise stated)

Search terms require exact match - topics are broader but sometimes are not available

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

9

SECURITY MAX IS 100

  • Cloud Computing (Search Term)
  • Big Data
  • Computer Science (Field)
  • Artificial Intelligence
  • Security

Digital Science Center

●AI rising

●Security and CS Flattish

●Clouds and Big Data small

Some smaller areas: Google Trends last 5 years (Topics unless stated)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

10

  • SuperComputing (Search Term)
  • Edge Computing
  • Grid Computing
  • Cloud Computing (Search Term)
  • HPC

Digital Science Center

Some medium size areas: Google Trends last 5 years (Topics unless otherwise stated)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

11

AI IS 10X BIG DATA, CYBERINFRASTRUCTURE, EXASCALE SMALL

  • High Performance Computing (HPC)
  • Deep Learning
  • Big Data
  • Machine Learning
    (Search Term)
  • Internet of Things

Digital Science Center

●IoT ML DL growing

●Big Data HPC flattish with HPC < Big Data

More on the Evolution of Interests and Communities

  • AI/ML
  • Systems
  • HPC
  • Cloud
  • Big Data

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

12

Digital Science Center

Importance of HPC, Cloud and Big Data Community

  • HPC/HPDC Community not growing in terms of obvious metrics such as new faculty advertisements, student interest, papers published
  • This is happening even though processing Big Data obviously requires HPC unless it is dominantly Hadoop style big data management
  • HPC community could perhaps align better with mainstream (Industry) systems
    • Otherwise they may be ignored as mainstream larger and supported by Industry
  • SysML Conference Stanford March 31 - April 2, 2019 is a new mainstream systems + ML community (note speaker ratio as 15.5 Academia to 14.5 Industry)
  • Cloud Community quite strong in Industry; relatively small academically as Industry has some advantages
  • Big data community strong in Academia and Industry although definition less clear as most things are big data; growing but still quite small in terms of dedicated activities

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

13

Digital Science Center

Importance of AI

  • AI (and several forms of ML) will dominate the next 10 years and it has distinctive impact on applications whereas HPC, Clouds and Big Data are important and essential enablers
  • AI First popular with Industry with 2017 Headlines
    • The Race For AI: Google, Twitter, Intel, Apple In A Rush To Grab Artificial Intelligence Startups
    • Google, Facebook, And Microsoft Are Remaking Themselves Around AI
    • Google: The Full Stack AI Company
    • Bezos Says Artificial Intelligence to Fuel Amazon's Success
    • Microsoft CEO says artificial intelligence is the 'ultimate breakthrough'
    • Tesla’s New AI Guru Could Help Its Cars Teach Themselves
    • Netflix Is Using AI to Conquer the World... and Bandwidth Issues
    • How Google Is Remaking Itself As A “Machine Learning First” Company
    • If You Love Machine Learning, You Should Check Out General Electric
  • Could refine emphasis on data science with AI First X
    • where X runs over areas where AI can help
    • e.g. AI First Engineering; AI First Cyberinfrastructure; AI First Social Science etc.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

14

Digital Science Center

Aligning with Industry

15

Digital Science Center

Learning from Industry

  • Industry playing a larger role than 10-20 years ago in advancing research as well as development (where they always led)
  • So useful to work with and learn from them: here are four important directions
  • All Industry: Public and Private Clouds 94% workloads in 2021: we should focus on HPC Clouds and maximize HPC Cloud Interoperability
  • ML Companies: Work with MLPerf
  • Microsoft: Global AI Supercomputer: we should be part of global scope and research architecture
  • Google: Machine Learning for Systems and Systems for Machine Learning: use ML for Systems to transform all computing (simulations and analytics)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

16

Digital Science Center

Dominance of Cloud Computing

  • 94 percent of workloads and compute instances will be processed by cloud data centers (22% CAGR) by 2021-- only six percent will be processed by traditional data centers (-5% CAGR).
  • Hyperscale data centers will grow from 338 in number at the end of 2016 to 628 by 2021. They will represent 53 percent of all installed data center servers by 2021. They form a distributed Compute (on data) grid with some 50 million servers
  • Analysis from CISCO https://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/white-paper-c11-738085.html updated November 2018

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

17

Number of instances per server

Number of Cloud Data Centers

Number of Public or Private Cloud Data Center Instances

Digital Science Center

Learning from Industry

  • Industry playing a larger role than 10-20 years ago in advancing research as well as development (where they always led)
  • So we need to work with and learn from them: here are four important directions
  • All Industry: Public and Private Clouds 94% workloads in 2021: we should focus on HPC Clouds
  • ML Companies: Work with MLPerf
  • Microsoft: Global AI Supercomputer: we should be part of global scope and research architecture
  • Google: Machine Learning for Systems and Systems for Machine Learning: use ML for Systems to transform all computing (simulations and analytics)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

18

Digital Science Center

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

19

5/10/2019

Note Industry Dominance

MLPerf's mission is to build fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services. MLPerf was founded in February, 2018 as a collaboration of companies and researchers from educational institutions. MLPerf is presently led by volunteer working group chairs. MLPerf could not exist without open source code and publically available datasets others have generously contributed to the community.

Get Involved

Digital Science Center

Performance of Time Series Machine Learning Algorithms (MLPerf)

20

Areas

Applications

Model

Data sets

Papers

Cars, Taxis, Freeway Detectors

TT-RNN , BNN, LSTM

[6-8]

Wearables, Medical instruments: EEG, ECG, ERP, Patient Data

LSTM, RNN

OPPORTUNITY [9-10],

EEG [11-14], MIMIC [15]

[16-20]

Intrusion, classify traffic, anomaly detection

LSTM

[21, 23-25]

Household electric use, Economic, Finance, Demographics, Industry

CNN, RNN

[28-29]

Stock Prices versus time

CNN, RNN

Available academically from Wharton [30]

[31]

Climate, Tokamak

Markov, RNN

[33-35]

Events

LSTM

[36-37]

Language and

Translation

Pre-trained Data

Transformer [38]

[39-40]

[41-42]Mesh Tensorflow

All-Neural On-Device Speech Recognizer

RNN-T

[43]

IndyCar Racing

Real-time car and track detectors

HTM

[44]

Twitter

Online Clustering

Available from Twitter

[45-46]

Digital Science Center

http://jeffnolan.com/wp/2015/11/23/the-fascinating-implications-for-autonomous-vehicles/

https://www.comsol.com/blogs/analyzing-a-component-of-the-iter-tokamak-with-simulation/

https://www.oreilly.com/ideas/identifying-viral-bots-and-cyborgs-in-social-media

https://www.caretakermedical.net/

https://www.eventsforce.com/

https://money.cnn.com/2013/09/13/investing/stocks-markets/index.html

https://www.stickpng.com/img/icons-logos-emojis/tech-companies/twitter-logo

https://www.flir.com/products/cameleon-tactical/

https://www.androidpolice.com/2019/03/12/google-bringing-faster-on-device-speech-recognition-to-gboard-starting-with-pixel-phones/

Learning from Industry

  • Industry playing a larger role than 10-20 years ago in advancing research as well as development (where they always led)
  • So we need to work with and learn from them: here are four important directions
  • All Industry: Public and Private Clouds 94% workloads in 2021: we should focus on HPC Clouds and maximize HPC
  • ML Companies: Work with MLPerf
  • Microsoft: Global AI Supercomputer: we should be part of global scope and research architecture
  • Google: Machine Learning for Systems and Systems for Machine Learning: use ML for Systems to transform all computing (simulations and analytics)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

21

Digital Science Center

Microsoft Summer 2018: Global AI Supercomputer

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

22

By Donald Kossmann

Digital Science Center

Overall Global AI and Modeling Supercomputer GAIMSC Architecture

  • Global says we are all involved - it is an HPDC system
  • I added “Modeling” to get the Global AI and Modeling Supercomputer GAIMSC
  • There is only a cloud at the logical center but it’s physically distributed and domated by a few major players
  • Modeling was meant to include classic simulation oriented supercomputers
  • Even in Big Data, one needs to build a model for the machine learning to use
  • GAIMSC will use classic HPC for data analytics which has similarities to big simulations (HPCforML)
  • GAIMSC must also support I/O centric data management with Hadoop etc.
  • Nature of I/O subsystem controversial for such HPC clouds
    • Lustre v. HDFS; importance of SSD and NVMe;
  • HPC Clouds would suggest that MPI runs well on Mesos and Kubernetes and with Java and Python

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

23

Digital Science Center

Learning from Industry

  • Industry playing a larger role than 10-20 years ago in advancing research as well as development (where they always led)
  • So we need to work with and learn from them: here are four important directions
  • All Industry: Public and Private Clouds 94% workloads in 2021: we should focus on HPC Clouds and maximize
  • ML Companies: Work with MLPerf
  • Microsoft: Global AI Supercomputer: we should be part of global scope and research architecture
  • Google: Machine Learning for Systems and Systems for Machine Learning: use ML for Systems to transform all computing (simulations and analytics)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

24

Digital Science Center

Dean at NeurIPS

DECEMBER 2017

  • ML for optimizing parallel computing (load balancing)
  • Learned Index Structure
  • ML for Data-center Efficiency
  • ML to replace heuristics and user choices (Autotuning)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

25

Digital Science Center

Implications of Machine Learning for Systems and Systems for Machine Learning

  • We could replace “Systems” by “Cyberinfrastructure” or by “HPC” and/or “HPDC”
  • I use HPC as we are aiming at systems that support big data or big simulations and almost by (my) definition could naturally involve HPC.
  • So we get ML for HPC and HPC for ML or ML for HPDC and HPDC for ML
  • HPC for ML is very important but has been quite well studied and understood
    • It makes data analytics run much faster
  • ML for HPC is transformative both as a technology and for application progress enabled
    • If it is ML for HPC running ML, then we have the creepy situation of the AI supercomputer improving itself
    • Microsoft 2018 faculty summit discussed ML to improve Big Data systems e.g. configure database system.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

26

Digital Science Center

MLforHPDC/HPC (ML for Systems) in detail

  • MLforHPDC/HPC can be further subdivided into several categories:
    • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular simulations. Well established and successful
    • MLControl: Using simulations (with HPC) and ML in control of experiments and in objective driven computational campaigns. Here simulation surrogates are very valuable to allow real-time predictions.
    • MLAutotuning: Using ML to configure (autotune) ML or HPC simulations.
    • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations or parts of simulations. The same ML wrapper can also learn configurations as well as results. Most Important.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

27

Digital Science Center

MLaroundHPDC/HPC

MLAutotuning

28

Digital Science Center

Status of MLforHPC Research

ML for HPDC or Systems

  • http://dsc.soic.indiana.edu/publications/Learning%20EverywhereResource.pdf
  • 111 Citations (mainly 2017 or later) with very short comments
  • MLaroundHPC/MLAutotuning: not much on Computer Science or Partial Differential equations
  • Particle Dynamics: largest component with, smart sampling, effective potentials, “Computation Results from Computation defining Parameters” with “simple” deep learning replacing sophisticated dimension reduction; material science properties very active
    • Give examples from nanoparticle simulations
  • Agent-based Simulations in networked systems or virtual tissues. Perhaps most promising as inevitably data driven as no fundamental equations for cells, cars, people, bacteria
    • Review plan to develop new approach to computational systems biology -- simulate organisms based on models for cell

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

29

Digital Science Center

9 MLaroundHPC Scenarios

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

30

INPUT

OUTPUT

Digital Science Center

MLAutoTuningHPC: Learning Configurations

  • This is classic Autotuning and one optimizes some mix of performance and quality of results with the learning network inputting the configuration parameters of the computation.
  • This includes initial values and also dynamic choices such as block sizes for cache use, variable step sizes in space and time.
  • It can also include discrete choices as to the type of solver to be used.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

31

Digital Science Center

MLAutotunedHPC. Machine Learning for Parameter Auto-tuning in Molecular Dynamics Simulations: Efficient Dynamics of Ions near Polarizable Nanoparticles (NPs)

  • Integration of machine learning (ML) methods for parameter prediction for MD simulations by demonstrating how they were realized in MD simulations of ions near polarizable NPs.
  • Note ML used at start and end of simulation blocks

JCS Kadupitiya,

Geoffrey Fox,

Vikram Jadhao

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

32

Testing

Training

Inference I

Inference II

ML-Based Simulation Configuration

Testing

Training

Inference I

Inference II

Digital Science Center

Results for Nanosimulation MLAutotuning

  • Auto-tuning of parameters generated accurate dynamics of ions for 10 million steps while improving the stability.
  • Integrated with ML-enhanced framework with hybrid OpenMP/MPI
  • Maximum speedup of 3 from MLAutoTuning and a maximum speedup of 600 from the combination of ML and parallel computing.

Key characteristics of simulated system showing greater stability for ML enabled adaptive approach.

Quality of simulation measured by time simulated per step with increasing use of ML enhancements. (Larger is better).

Inset is timestep used

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

33

Digital Science Center

MLAutoTuningHPC: Smart Ensembles

  • Here we choose the best set of parameters to achieve some computation goal
  • Such as providing the most efficient training set with defining parameters spread well over the relevant phase space.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

34

Digital Science Center

MLforHPC Simulation Surrogates

MLaroundHPC: Learning Outputs from Inputs:

  • Computation Results from Computation defining Parameters
  • Here one just feeds in a modest number of meta-parameters that the define the problem and learn a modest number of calculated answers.
  • This presumably requires fewer training samples than “fields from fields” and is main use so far

Operationally same as SimulationTrainedML but with a different goal: In SimulationTrainedML the simulations are performed to directly train an AI system rather than the AI system being added to learn a simulation.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

35

Digital Science Center

MLaroundHPC: ML for High Performance Surrogates of nanosimulations

  • An example of Learning Outputs from Inputs: Computation Results from Computation defining Parameters
  • Employed to extract the ionic structure in electrolyte solutions confined by planar and spherical surfaces.
  • Written with C++ and accelerated with hybrid MPI-OpenMP.
  • MLaroundHPC successfully learns desired features associated with the output ionic density that are in excellent agreement with the results from explicit molecular dynamics simulations.
  • Will be deployed on nanoHUB for
    education (an attractive use of surrogates)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

36

Digital Science Center

ANN for Regression

  • ANN was trained to predict three continuous variables; Contact density ρc , mid-point (center of the slit) density ρm , and peak density ρp
  • TensorFlow, Keras and Sklearn libraries were used in the implementation
  • Adam optimizer, xavier normal distribution, mean square loss function, dropout regularization.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

37

  • Dataset having 6,864 simulation configurations was created for training and testing (0.7:0.3) the ML model.
  • Note learning network quite small

Digital Science Center

Parameter Prediction in Nanosimulation

  • ANN based regression model predicted Contact density ρc  , mid-point (center of the slit) density ρm , and peak density ρp accurately with a success rate of 95:52% (MSE ~ 0:0000718), 92:07% (MSE ~ 0:0002293), and 94:78% (MSE ~ 0:0002306) respectively, easily outperforming other non-linear regression models
  • Success means within error bars (2 sigma) of Molecular Dynamics Simulations

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

38

Digital Science Center

Accuracy comparison between ML predictions and MD simulation results

ρc , ρm and ρp predicted by the ML model were found to be in excellent agreement with those calculated using the MD method; data from either approach fall on the dashed lines which indicate perfect correlation.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

39

Digital Science Center

Speedup of MLaroundHPC

  • Tseq is sequential time
  • Ttrain time for a (parallel) simulation used in training ML
  • Tlearn is time per point to run machine learning
  • Tlookup is time to run inference per instance
  • Ntrain number of training samples
  • Nlookup number of results looked up




  • Becomes Tseq/Ttrain if ML not used
  • Becomes Tseq/Tlookup (105 faster in our case) if inference dominates (will overcome end of Moore’s law and win the race to zettascale)
  • Another factor as inferences uses one core; parallel simulation 128 cores
  • Strong scaling as no need to parallelize more than effective number of nodes

Ntrain is 7K to 16K in our work

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

40

Digital Science Center

MLforHPC Simulation Surrogates

MLaroundHPC: b) Learning Outputs from Inputs: Fields from Fields

  • Here one feeds in initial conditions and the neural network learns the result where initial and final results are fields
  • There is also c) Learning Outputs from Inputs: output fields from Computation defining Parameters combining a) and b)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

41

Digital Science Center

How does one do this for case c) ?

  • If you are learning particular output features (as in Computation Results from Computation defining Parameters), then a simple (not necessarily deep) neural net suffices
  • If you want to learn the output fields (from either input fields or input Computation defining Parameters), then a more sophisticated approach is appropriate
  • Recent paper “Massive computational acceleration by using neural networks to emulate mechanism based biological models” uses 501 LSTM units to represent a one-dimensional grid of values which is output of a two-dimensional gene circuit simulation which only depends on radius.
  • Note LSTM models sequences and one gets
  • Sequences in either time (usual LSTM application) or space
  • The ML representation allowed a much richer parameter sweep showing features not in training set
  • Performance improved by factor 30,000

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

42

Digital Science Center

Massive computational acceleration by using neural networks to emulate mechanism based biological models

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

43

Digital Science Center

Agents and Time-Series Case Studies

Learning Model Details

MLaroundHPC: Learning Model Details

a) Learning Agent Behavior One has a model such as a set of cells as agents modeling a virtual tissue. One can use ML to learn dynamics of cells replacing detailed computations by ML surrogates.

  • As can be millions to billions of such agents the performance gain can be huge as each agent uses same learned model..
  • This is MLaroundHPC for cells but MLAutotuning for multi-cell (tissue) phase

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

45

Digital Science Center

MLaroundHPC: Learning Model Details

b) Learning Effective Potentials or Interaction Graphs An effective potential is an analytic, quasi-empirical or quasi-phenomenological potential that combines multiple, perhaps opposing, effects into a single potential.

  • This is classic coarse graining strategy
  • Deep Learning replacing dimension reduction techniques

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

46

Digital Science Center

MLaroundHPC: Learning Model Details ML for Data Assimilation

Take the case where we have “videos” recording observational data i.e. data is a high dimensional (spatial extent) time series

(c) Learning Agent Behavior – a Predictor-Corrector approach Here one time steps models and at each step optimize the parameters to minimize divergence between simulation and ground truth data.

  • Example: produce a generic model organism such as an embryo. Take this generic model as a template and learn the different adjustments for particular individual organisms.
  • Build on Ride hailing work
  • Current state of the art expresses spatial structure as a convolutional neural net and time dependence as recurrent neural net (LSTM)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

47

Digital Science Center

Yan Liu@USC ICML 2019 Time Series workshop

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

48

Include Space (Convolutional Graph) and Time (RNN)

Digital Science Center

Digital Science Center

Actually Really Need To do Everything Simultaneously in MLaroundHPC

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

50

50

Digital Science Center

Simulating Biological Organisms (with James Glazier @IU)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

51

Learning Agent Behavior
Replace components by learned surrogates
(Reaction Kinetics Coupled ODE’s)

Predictor-Corrector

Smart Ensembles

All steps use MLAutotuning

Digital Science Center

Challenges and Opportunities

Computer Science Issues

Computer Science Issues I

  • Hundreds of Ph. D. theses!
  • What computations can be assisted by what ML in which of 9 scenarios
    • What is performance of different DL (ML) choices in compute time and capability
    • What can we do with zettascale computing?
  • Redesign all algorithms so they can be ML-assisted
  • Dynamic interplay (of data and control) between simulations and ML seems likely but not clear at present
  • There ought to be important analogies between time series and this area (as simulations are 4D time series?)
    • Exploit MLPerf examples?

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

53

Digital Science Center

Computer Science Issues II

  • Little study of best ANN structure especially for hardest cases such as “predict fields from fields” and ML assisted data assimilation
    • Not known how large training set needs to be.
    • Most published data has quite small training sets
  • I am surprised that there is not a rush to get effective performances on simulations of exascale and zettascale on current machines
  • Interesting load balancing issues if in parallel case some points learnt using surrogates and some points calculated from scratch
  • Little or no study of either predicting errors or in fact of getting floating point numbers from deep learning.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

54

Digital Science Center

System Hardware for ML and HPC - HPCforML

  • HPCforML needs to support high performance computation and high performance I/O
    • Communication needs to be good for all machine learning unless it is pleasingly parallel
    • Graph algorithms can have high MPI/synchronization overhead as well as demands on memory systems
    • Not certain that deep learning will always be suitable for GPU’s and some papers claim CPU’s better than GPU’s on RNN/LSTM whereas
    • GPU’s are better than CPU’s on CNN’s used in many cases
  • Need high speed networks, local fast disks and accelerator which can vary by application
  • Note pleasingly parallel computations dominate many areas and these will not need high performance communication

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

55

Digital Science Center

System Hardware for ML and HPC - MLforHPC

  • MLforHPC needs to support large scale simulations, large scale data analytics and their integration
  • Integration sometimes as distinct ML and HPC computation jobs – run simulation for training data and then run ML
  • But sometimes intertwined in a single job.
  • Dynamic MLAutotuning, Effective potential and ML assisted data assimilation give intertwined jobs
  • Need ML optimization and Simulation optimization spread through machine and fast ways for ML and simulation to exchange data
  • This could imply heterogenous accelerators and fast I/O and internode communication to enable ML and Computation to run together.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

56

Digital Science Center

Conclusions

Conclusions

  • HPC is essential for the future of the world
    • Everybody needs systems
    • Need to align communities

Global AI and Modeling Supercomputer GAIMSC good framework with HPC Cloud linked to HPC Edge

  • Training on cloud; Inference and some training on the edge

HPDC/HPC is essential for the future of the world

  • Everybody needs systems
  • Need to align communities to ensure HPC importance recognized

Good to work closely with industry

  • Student Internships, Collaborations such as Contribute to MLPerf

MLforHPDC/HPC very promising where we could aim at

  • First Zettascale effective performance in next 2 years
  • Hardware/Software aimed at general ML assisted speedup of computation
  • Your health can be engineered with ML-assisted personalized nanodevices designed based on the ML-assisted digital twin of disease in your tissues

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

58

1

2

3

4

Digital Science Center

Extras

Graphs measuring popularity of areas

59

5/10/2019

Digital Science Center

Papers Submitted: Comparing 7 Conferences and 2 Sums

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

60

HPDC not included as talk given there SoCC (other major cloud conference) not included as could find no statistics

Digital Science Center

Arxiv Publications from Aiindex.org

In 2017, absolute number of papers are

AI: 23,922 CS: 383,279 All: 3,032,731

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

61

Digital Science Center

H5-index correlated with conference size

H5 is h-index calculated over the last 5 years produced in July 2018

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

62

Digital Science Center

Some smaller areas: Google Trends last 5 years (Topics unless otherwise stated)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

63

  • Cyberinfrastructure
  • Fog Computing
  • HPC
  • Parallel Computing (Programming Paradigm)
  • Edge Computing

Digital Science Center

●Some small fields. Edge growing but pretty small

Some growing areas: Google Trends last 5 years (Topics unless otherwise stated)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

64

  • Kubernetes
    (Comp. App)
  • Docker (Software)
  • Amazon Web Services
  • Artificial Intelligence
  • Azure (Comp. App)

Digital Science Center

●Cloud features growing quite fast

Extras

Details on SysML conference

65

5/10/2019

Digital Science Center

SysML Conference Stanford March 31 - April 2, 2019

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

5/10/2019

Digital Science Center

Extras

Data Science and Jobs

67

5/10/2019

Digital Science Center

Gartner on Data Science, Data Engineering and Software Engineering

  •  Gartner says that job numbers in data science teams are
  • 10% - Data Scientists are quite small fraction
  • 20% - Citizen Data Scientists ("decision makers")
  • 30% - Data Engineers
  • 20% - Business experts
  • 15% - Software engineers
  • 5% - Quant geeks
  • ~0% - Unicorns
    (very few exist!)

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

Digital Science Center

Indeed.com Trends

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

Digital Science Center

Indeed.com Trends

  • Note Job Seeker and Jobs posted reversed

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

Digital Science Center

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

NIPS 2015 http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Gartner says that 3 times as many jobs for data engineers as data scientists.

Digital Science Center

Communities/Expertise in Future World

  • Hard core ML community enhances Machine Learning Algorithms
  • AI First Engineering community uses and advances
    • AI
    • HPC and Cyberinfrastructure
    • Parallel and Distributed Computing
    • Edge Computing and Internet of Things
  • To build High Performance Big Data systems addressing many important research and community/industry needs
  • All of this part of Applied Computer Science

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

5/10/2019

Digital Science Center

Extras

More details in GAIMSC -- especially edge

73

5/10/2019

Digital Science Center

Overall Global AI and Modeling Supercomputer GAIMSC Architecture II

  • There is a very distributed set of devices surrounded by local Fog computing; this forms the logically and physically distributed edge
  • The edge is structured and largely data
  • These are two differences from the Grid of the past
  • Note that the self driving car will have its own fog and will not share fog with truck that it is about to collide with
  • The cloud and edge will both be very heterogeneous with varying accelerators, memory size and disk structure.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

74

Digital Science Center

Extras

HPC for ML Details

75

5/10/2019

Digital Science Center

HPCforML (Systems for ML) in detail

  • HPCforML can be further subdivided
  • HPCrunsML: Using HPC to execute ML with high performance
  • SimulationTrainedML: Using HPC simulations to train ML algorithms, which are then used to understand experimental data or simulations. (Return to this as similar to MLaroundHPC)
  • Twister2 supports HPCrunsML by using high performance technology everywhere and this has been my major emphasis over last 5 years

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

76

Digital Science Center

Diversion on Twister2

77

5/10/2019

Digital Science Center

Twister2 Highlights I

  • “Big Data Programming Environment” such as Hadoop, Spark, Flink, Storm, Heron but uses HPC wherever appropriate and outperforms Apache systems – often by large factors
  • Runs preferably under Kubernetes Mesos Nomad but Slurm supported
  • Highlight is high performance dataflow supporting iteration, fine-grain, coarse grain, dynamic, synchronized, asynchronous, batch and streaming
  • Three distinct communication environments
    • DFW Dataflow with distinct source and target tasks; data not message level; Data-level Communications spilling to disks as needed
    • BSP for parallel programming; MPI is default. Inappropriate for dataflow
    • Storm API for streaming events with pub-sub such as Kafka
  • Rich state model (API) for objects supporting in-place, distributed, cached, RDD (Spark) style persistence with Tsets (see Pcollections in Beam, Datasets in Flink, Streamlets in Storm, Heron)

78

Digital Science Center

Twister2 Highlights II

  • Can be a pure batch engine
    • Not built on top of a streaming engine
  • Can be a pure streaming engine supporting Storm/Heron API
    • Not built on on top of a batch engine
  • Fault tolerance (June 2019) as in Spark or MPI today; dataflow nodes define natural synchronization points
  • Many API’s: Data, Communication, Task
    • High level hiding communication and decomposition (as in Spark) and low level (as in MPI)
  • DFW supports MPI and MapReduce primitives: (All)Reduce, Broadcast, (All)Gather, Partition, Join with and without keys
  • Component based architecture -- it is a toolkit
    • Defines the important layers of a distributed processing engine
    • Implements these layers cleanly aiming at high performance data analytics

Digital Science Center

80

Parallel SVM using SGD execution time for 320K data points with 2000 features and 500 iterations, on 16 nodes with varying parallelism

Times

Spark RDD > Twister2 Tset > Twister2 Task > MPI

Digital Science Center

MLforHPC in detail

  • MLforHPC can be further subdivided into several categories:
    • MLAutotuning: Using ML to configure (autotune) ML or HPC simulations.
    • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular simulations
    • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations or parts of simulations. The same ML wrapper can also learn configurations as well as results. Most Important
    • MLControl: Using simulations (with HPC) in control of experiments and in objective driven computational campaigns. Here the simulation surrogates are very valuable to allow real-time predictions.
  • Twister2 supports MLforHPC by allowing nodes of HPC dataflow representation to be wrapped with ML and it supports GAIMSC by integrating edge with cloud

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

6/10/2019

Digital Science Center

82

5/10/2019

Software model supported by Twister2

Continually interacting in the Intelligent Aether with

Suitable for MLforHPC?

Digital Science Center

Extras

Details on ML for HPC

83

5/10/2019

Digital Science Center

Comments on Compendium of MLforHPC Research

  • http://dsc.soic.indiana.edu/publications/Learning%20EverywhereResource.pdf
  • 111 Citations with very short comments
  • Incomplete (deliberately) on MLafterHPC as this mature but still important
  • Incomplete on MLControl. Need to improve
  • Larger fraction of MLaroundHPC and MLAutotuningHPC work where nearly all citations are 2017 or later
  • Key divisions for MLaroundHPC/MLAutotuning
    • Computer Science: either use ML to improve algorithm or system architecture to implement: I have not found much work
    • Particle Dynamics: largest component with, smart sampling, effective potentials, “Computation Results from Computation defining Parameters” with “simple” deep learning replacing sophisticated dimension reduction; material science properties very active
    • Agent-based Simulations in networked systems or virtual tissues. Perhaps most promising as inevitably data driven as no fundamental equations for cells, cars, people, bacteria
    • Partial Differential equations: not so well developed as above two areas

84

Digital Science Center

MLAutoTuningHPC: Learning Model Setups from Observational Data

  • Seen when simulation set up as a set of agents.
  • Tuning agent (model) parameters to optimize agent outputs to available empirical data presents one of the greatest challenges in model construction.

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

85

Digital Science Center

Results for MLAutotuning

86

  • An ANN based regression model was integrated with MD simulation and predicted excellent simulation environment 94:3% of the time; human operation is more like 20(student)-50(faculty)% and runs simulation slower to be safe.
  • Auto-tuning of parameters generated accurate dynamics of ions for 10 million steps while improving the stability.
  • The integration of ML-enhanced framework with hybrid OpenMP/MPI parallelization techniques reduced the computational time of simulating systems with 1000 of ions and induced charges from 1000 of hours to 10 of hours, yielding a maximum speedup of 3 from MLAutoTuning and a maximum speedup of 600 from the combination of ML and parallel computing.
  • The approach can be generalized to select optimal parameters in other MD applications & energy minimization problems.

Quality of simulation measured by time simulated per step with increasing use of ML enhancements. (Larger is better).
Inset is timestep used

Key characteristics of simulated system showing greater stability for ML enabled adaptive approach.

Comparison of results for peak densities of counterions between adaptive (ML) and original non-adaptive cases (they look identical)

Ionic densities from MLAutotuned system. Inset compares ML system results with those of slower original system

Digital Science Center

Rapid access to trendlines using ML Surrogates

(Top) Trendlines for contact density vs. ion diameter
(Bottom) Trendlines for contact density vs. confinement length

ML predictions are within the error bars generated via MD simulations (1%).

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

87

Contact, peak, and center-of-the-slit (mid-point) density vs. salt concentration.

Digital Science Center

MLControl

  • Experiment Control Using simulations (possibly with HPC) in control of experiments and in objective driven computational campaigns . Here the simulation surrogates are very valuable to allow real-time predictions. Applied in Material Science and Fusion
  • Experiment Design One of the biggest challenges of models is the uncertainty in the precise model structures and parameters. Model-based design of experiments (MBDOE) assists in the planning of highly effective and efficient experiments – it capitalizes on the uncertainty in the models to investigate how to perturb the real system to maximize the information obtained from experiments. MBDOE with new ML assistance identifies the optimal conditions for stimuli and measurements that yield the most information about the system given practical limitations on realistic experiments

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

6/10/2019

Digital Science Center

Extras

Simulation v Big Data Challenges

89

5/10/2019

Digital Science Center

Big Data and Simulation Comparison of Difficulty in Parallelism

PERSPECTIVES ON HIGH-PERFORMANCE COMPUTING IN A BIG DATA WORLD

90

Digital Science Center

HPDC Presentation - Google Slides