1 of 113

HDR2

�from Harnessing the Data Revolution to Harvesting the Data Revolution

NSF HDR PI Meeting�Oct 26-27 2022

2 of 113

Lightning talk

Fill out the single shared slide deck for your presentations next week. One presentation per award

  • 1 min presentation for TRIPODS Phase I and DSC (title + upto 2 slides)
  • 2 min presentation for TRIPODS Phase II and Institute (title + upto 4 slides)

3 of 113

Oct 26 Wed (Last name A~J)

4 of 113

DSC-WAV: Wrangle, Analyze,Visualize

Valerie Barr (co-PI)�Margaret Hamilton Distinguished Professor of Computer Science�Director of the Bard Network Computing Initiative�Bard College

5 of 113

Student teams collaborate with local community organizations to address a data science question through a semester-long project

  • Faculty-mentored student teams
  • Develop real world experience by solving open-ended problems
  • Scrum methodology for dynamic development & project management skills
  • Best practices in inclusive collaboration, version control, client communication, and code review

DSC-WAV leadership working to establish formal, transparent pathways to four year programs from community colleges:

6 of 113

imageomics.org

7 of 113

imageomics.org

8 of 113

9 of 113

Tripods Phase I: Tufts Tripods

HDR Tripods: Building the Foundation for a Data Intensive Study Center

Lenore Cowen (PI)

Professor, Computer Science

Professor, Math & GSBS

Tufts University

10 of 113

T-Tripods supports Interdisciplinary Research in the Foundations of Data Science

3 Foundation Research Foci:

RF1: Y1-4: Graphs and Tensors

RF2:Y2–4: Spatial/Temporal Data

RF3:Y3-4: Data Guarantees: Quality, Transparency,Privacy,

Fairness,Trust

4 application area domains

11 of 113

T-Tripods also connects research to education and broadening participation in Data Science

Graduate:

Interdisciplinary Graduate Training: Advising Trio model

Undergraduates:

DIAMONDS: Directed Intensive and Mentored Opportunities in Data Science: Broadening participation for all

DIAMONDS REU: summer 2022

12 of 113

HDR DSC: AI Across the Statewide Curriculum

Jennifer Drew (PI)

Senior Lecturer

Microbiology and Cell Science

College of Agriculture and Life Sciences

Satyanarayan Dev (Co-PI)

Associate Professor and Chair

Biological Systems Engineering

College of Agriculture and Food Sciences

13 of 113

Goal: 3-year program to build a diverse and skilled AI workforce by enhancing the reach and impact of undergraduate AI curriculum

14 of 113

Data Science Corps: Connecting the Dots

HDR DSC: Collaborative Research: Connecting the Dots

2” x 2”

Jeffrey R. Errington, PI�Professor and Associate Dean�University at Buffalo

15 of 113

16 of 113

TRIPODS Phase I: UMASS TRIPODS

TRIPODS Institute for Theoretical Foundations of Data Science

Patrick Flaherty, co-PI�Associate Professor of Mathematics & Statistics�UMass Amherst

17 of 113

UMass TRIPODS Accomplishments

  • Summer pre-college data science course based on data8 with 4 scholarships in 2022
  • 37 data science foundations and applications publications + 6 preprints
  • Successful REU program with student poster presentations on summer projects
  • Multiple virtual workshops on foundations of data science (https://sites.google.com/view/dstheory)
  • Placement of two postdocs in TT faculty positions: Bryant University & Syracuse University

18 of 113

DSC: Central Coast Data Science Partnership

Training a New Generation of Data Scientists

2” x 2”

Alexander Franks�co-PI

Assistant Professor, Statistics

University of California, Santa Barbara

19 of 113

Central Coast Data Science Partnership

  • Data science training connecting three main public higher education institutions in California: UC, CSU, and Community Colleges

(HDR DSC Awards #1924205 & #1924008)

20 of 113

Central Coast Data Science Partnership

New Courses

Capstone Projects

Summer Research Experience

Data Science Fellows

  • “Committee” participation
    • Outeach
    • Education
    • Events
    • Infrastructure
  • $5k stipend for DS fellows
  • Participate in research and Capstone Projects

Interns and Fellows serve as “ambassadors” of data science at their schools

21 of 113

22 of 113

23 of 113

24 of 113

DSC: DaMADScience Corps

The DelAware And MiD-Atlantic Science Corps

Jing Gao (co-PI)

Assistant Professor of Geospatial Data Science

University of Delaware

25 of 113

DaMADScience Corps: The DelAware And MiD-Atlantic Science Corps

PI Bianco UD CoPIs: Gao, Dobler LU CoPI: Tamez DSU CoPI: Boukari

A partnership between University of Delaware (UD), Lincoln University (LU), Delaware State University (DSU) to create an equitable, accessible program for data science education

  • Is accessible to students of any background and STEM preparation level
  • Supports students' education and career goals across disciplines (not only STEM)
  • Provides job-ready skills including data-ethics training
  • Builds capacity at HBCUs for data science training leveraging UD’s experience in Data Science education
  • Builds skills and frameworks for equitable education at UD and supports recruitment of diverse scholars into Data Science

We are building an joint educational program of bootcamps, courses, hackathons, and research that�

26 of 113

TRIPODS Phase I: IDEAL

Institute for Data, Econometrics, Algorithms, and Learning

Varun Gupta (co-PI)�Associate Professor of Operations Management The University of Chicago Booth School of Business

27 of 113

Goal: Cross-campus and cross-disciplinary collaborations

CS

Stats

Econ

OR

EE

Law

Research foci:

  • High dimensional data analysis
  • Data science in strategic environments
  • Machine learning and optimization

28 of 113

29 of 113

30 of 113

HDR Institute: A3D3

Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery

Shih-Chieh Hsu

Director

Associate Professor, Physics

University of Washington

Phil Harris

Deputy Director

Assistant Professor, Physics

Massachusett Institute Technology

Mark Neubauer

Community Engagement Coordinator

Professor, Physics

Affiliate Professor, ECE and NCSA

University of Illinois Urbana-Champaign

31 of 113

NSF HDR Institute A3D3: �Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery

Our vision is to establish a tightly coupled organization of domain scientists, computer scientists, and engineers that unite three core components which are essential to achieve real-time AI to transform science and engineering discoveries.

9 institutions, 17 senior personnels, 2 research scientists�10 postdocs, 15 graduates, 12 undergraduates, 4 postbac

32 of 113

Research, Education, Community Engagement to to push the boundaries of data processing beyond industry technologies

SONIC

PyLog

High Energy Physics

Multi-messenger Astrophysics

Hardware-Algorithm co-development

Neuroscience

Community Engagement

33 of 113

iHARP: NSF HDR Institute for Harnessing Data and Model Revolution in the Polar Regions

iHARP Focus Area Leadership Team

Vandana Janeja

(Director)

UMBC 

Mathieu Morlighem

(Co-Director)

Dartmouth

Jianwu Wang (Co-PI)  

UMBC

Aneesh Subramanian 

(Co-PI)  

CU Boulder

Shashi Shekhar (Co-PI)

UMN

34 of 113

Data Science and Polar Science- A Continuum

  • Polar regions are Complex systems
    • Components: ice sheet, atmosphere, ocean, sea ice
    • Subcomponents: supraglacial hydrological system, subglacial pathways
    • Interconnections among components

  • Leverage a multi-source dataset of surface and near-surface feature
    • Link sparse data into time series and networks
    • Mine patterns, spatio-temporal relationships

  • Better understand component interactions
    • To inform ice-sheet and climate models

with more realistic physical relationships

    • To reduce uncertainty in projection of

the future mass balance of Ice Sheets,

sea level rise and climate change

iHARP Vision

iHARP advances our understanding of the response of polar regions to climate change and its global impacts by deeply integrating data science and polar science to spur physics-informed, data-driven discoveries.

35 of 113

Spatio-temporal Mining

Computer Vision

Prediction and Causal Inference

Scalability

Education and Outreach

Lenaerts et al. 2019

36 of 113

HDR DSC: Data Science for Energy Transition

2” x 2”

Mikyoung Jun, PI

Professor of Mathematics & ConocoPhillips Data Science Professor

University of Houston

37 of 113

HDR DSC: Data Science for Energy Transition

A multi-institutional team of five universities (UH, UHD, UHV, UHCL, SHSU) in greater Houston region, in partnership with multiple energy related industries

PI/Co-PI/Sr Personnel team consists of experts from statistics, computer science, engineering, geoscience, public and energy policy

Each year, the program will produce about 40 students (undergraduate and Master) from diverse background trained in data science skills essential for energy industry, as well as statistics, computer science, geoscience, and public policy

Program consists of 5 weeks summer camp/research team projects/summer internship

38 of 113

DSC: Interdisciplinary Traineeship for Socially Responsible

and Engaged Data Scientists

(iTREDS)

Thomas Mustillo (Co-PI)�Associate Professor�Keough School of Global Affairs�University of Notre Dame

Kristin Kuter (Co-PI)�Associate Professor and Chair�Mathematics and Computer Science�Saint Mary’s College

With: �Nitesh Chawla (PI), CS; Ann Marie Conrado, Design; Don Howard, Philosophy; Ron Metoyer, CS; Ewa Misiolek, Math & CS; Danielle Wood, Planning; Chris Wedrychowicz, Math & CS

39 of 113

The T-Shaped iTREDS Scholar: �Breadth in Profession Superskills; Depth in Data Acumen

Vision: A university education develops� undergraduate students who can see � the implications of their work to � society.

Construct: An interdisciplinary and experiential � learning program for students � working together with � stakeholders on data-driven � problems.

Contribution: Instill a mindset at the intersection of � a data-centered and human- � centered approach for a “21st � century data-capable workforce.”

40 of 113

The iTREDS Curriculum & Student Profile

  • Interdisciplinary: �St. Mary’s College Math and CS + Notre Dame CS + Notre Dame Arts & Letters
  • Diverse: �66% women; 19% URM
  • Sustainable: �Three cohorts of 25 students each: Fall 2020, Fall 2021, Fall 2022, …

41 of 113

TRIPODS Phase I: Rutgers DIMACS

Post-Doctoral Associates

Center for Discrete Math and Theoretical Computer Science (DIMACS)

Rutgers University

Ewerton Vieira

Cameron Thieme

42 of 113

DATA-INSPIRE: “DATA science for INtelligent Systems and People Interaction that integrates Research and Education Activities”

This institute is premised on our belief that advances in data science principles are needed to impact the emerging paradigm of intelligent machines and their convergence with human society. This foundational understanding is needed to further improve the performance and better explain the operation of such machines so they can accomplish diverse, real-world tasks and interact effectively with people.

42

14 faculty:

4 CS

4 Math

4 Stat

2 PostDoc

43 of 113

End of lightning talk session 1

Note: When leaving please use double doors in back

We will start again at 1:30

44 of 113

Oct 27 Thursday (Last name L~Z)

45 of 113

HDR DSC: The MCDC

The Metropolitan Chicago Data science Corps: Learning from Data to Support Communities

Suzan van der Lee�Principal Investigator�Professor�Northwestern University

Michelle Birkett, Mark Potosnak, Eunice Santos, Pascal Paschos, Nadja Insel, Yoo-Seong Song, Arend Kuyper, Francisco Iacobelli, Sara Woods, Denise Drane, Bennett Goldberg, Matthew Sperry

46 of 113

47 of 113

Community partners

48 of 113

NSF DSC: DS-PATH

Data Science Career Pathways in the California Inland Empire

Paea LePendu (co-PI)

Assistant Professor of Teaching

Computer Science & Engineering

UC Riverside

Mariam Salloum (PI)

Assistant Professor of Teaching

Computer Science & Engineering

UC Riverside

49 of 113

NSF DSC: DS-PATH

Data Science Career Pathways in the California Inland Empire

NEW: grades 6-12

DS curricula, teacher training, outreach++

50 of 113

DSC: Collaborative Research: Transforming Data Science Education through a Portable and Sustainable Anthropocentric Data Analytics for Community Enrichment (ADACE) Program

Yu Liang (PI)

Prof. of Computer Science,

UTC

CS+EE+Math+Biology+Chem+MD+CivilE+Sociology

51 of 113

Objective: establishing a community-engaged, multidisciplinary education and research program for anthropocentric data analytics.

Topics of Anthropocentric Data Analytics:

  • By the human: e.g., HIL ML, HMI
  • Of the human: e.g., Interpretable NN, physics-guided NN
  • For the human: e.g., AI ethics, Social network, AI-enabled medical device

Accomplishments

  • Developed an interdisciplinary curriculum
  • Workforce training
  • Workshops/hackathons/bootcamps
  • ADACE research projects (15+ journal papers, two dataset so far)
  • Extended the cooperation with local business (training program, sub-projects, etc.)

Part of participants of ADACE Workshop 2022

Stakeholders of ADACE

Life cycle of Anthropocentric Data Analytics

Representative ADACE research projects

52 of 113

DSC: Community-centered DS for Engineering Students

Collaborative Research: Infusion of data science and computation into engineering curricula (2021)

Dr. Wesley Reinhart, Co-PI�Asst Prof, Materials Science�Penn State University

Dr. Rebecca Napolitano, PI�Asst Prof, Architectural Engineering�Penn State University

53 of 113

DSC: Community-centered DS for Engineering Students

This material is based upon work supported by the National Science Foundation under Grant IIS-2123343.​

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

  • Curated community datasets with notes about curricular infusion /use

  • Plug-n-play lesson templates across engineering

  • Teach-the-teacher documentation to facilitate domain experts starting their own classes

How do we provide engineering students the opportunity to learn DS using community-data and real world problems?

54 of 113

Collaborative Research: HDR-DSC: Building Capacity in Data Science through Biodiversity, Conservation, and General Education

2” x 2”

Kathleen L. Prudic, PhD (lead PI)�Assistant Professor School of Natural Resources and the Environment�University of Arizona

Lewis and Clark College

Greta Binford, PhD (lead co-PI)

University of Arizona

55 of 113

Build data science capacity through General Education curriculum

  • Leverage PUI pedagogy with R1 resources and HSI diversity
  • Expose students in life sciences to data science applications and principles early and often
  • Upskill instructors with paid data science skills and pedagogy training
  • Incorporate library training resources to support high touch learning

56 of 113

Using interest in conservation to create more on ramps for data science training and professional development

Dealing with Data in the Wild

(Intro Data Science)

Freshman/Sophomores

General Education Course

Applied Data Science

(Adv Intro Data Science)

Sophomore +

InterInstitutional, Collaborative Project-Based Course with USFWS

Instructor Training

(Intro Data Science and Pedagogy)

Life Science Undergraduates, Grad Students, Post-Docs, and Faculty

The Carpentries Training

57 of 113

DSC: Earth Data Science Corps (EDSC)

HDR DSC: Earth Data Science Corps - Fulfilling Workforce Demand at the Intersection of Environmental Science and Data Science

Nathan Anderson Quarderer (current co-PI)�Postdoctoral Associate; Interim Education Director Earth Lab/ESIIL; CU Boulder/CIRES

Jennifer Balch; Director Earth Lab/ ESIIL (current PI; former co-PI)

Leah Wasser; pyOpenSci

(former PI)

58 of 113

DSC: Earth Data Science Corps (EDSC)

  • 5 partner institutions incl. 2 TCUs, 1 HSI, 1 CC
  • 60 students; 8 faculty partners
  • Women, minorities, Indigenous communities
  • 12-week paid internship; 100% virtual
  • Training + Immersive project-based learning
  • Open source solutions to EES and GIS using Python
  • 3rd yr of funding + 1 yr extension
  • EDSC → ESIIL [Stars] (Award # 2153040 [DBI])

59 of 113

DSC: Earth Data Science Corps (EDSC)

60 of 113

HDR TRIPODS I: D4 Institute

D4 (Dependable Data-Driven Discovery) Institute

Hridesh Rajan (PI)�Professor and Chair of Computer Science at Iowa State University

hridesh@iastate.edu

Breno Dantas Cruz�Postdoctoral Fellow Computer Science at Iowa State University

bdantasc@iastate.edu

D4 Team

Theoretical and Applied Data Science Initiative

kielkopf@iastate.edu

61 of 113

D4 Institute: Goals

  • Advancing the foundations of dependable data-driven discovery. Dependability is critical for both discoveries and decisions because unreliable discoveries can have catastrophic impacts.
  • D4 Institute is developing an overall framework for dependable data driven discovery: risks (what can go wrong?), measures (how to quantify risks?), and mechanisms (how to mitigate the risks?).
  • Phase I focus is on a subset of Data Science lifecycle and four risks (complexity, uncertainty, freshness, and resource constraints).
  • Broader impact activities have focused on creating a hub for sharing data science expertise and educating DS researchers.

DS Lifecycle

62 of 113

Diagnosing Faults in Deep Learning

4

model = Sequential ()

model.add(Dense(784, input_shape=(784,)))

model.add(Dense(50))

model.add(Activation('relu'))

model.add(Dense(10))

model.add(Activation(‘sigmoid'))

model.compile(loss='categorical_crossentropy, optimizer=SGD())

model.fit(X, Y, batch_size=batch_size, epochs=epoch)

DL faulty patterns have to be detected at training.

Problem Statement

Illustrative Example

Faulty behavior diagnosis

Suggest fix to model source code

DNN model training

Key Insights

Link model fault trends to problems in the DNN source code.

Analyze trends during DL model training to identify indicate faults.

Traditional software test suites are not applicable to DL-based software.

1

2

The map can be extended to link fault trends to problems in the DNN source code.

Each DL model is unique (e.g., architecture and data).

Trend analysis of model metrics

3

weights

Faulty pattern

63 of 113

TRIPODS Phase I: UIC

HDR TRIPODS: UIC Foundations of Data Science Institute

2” x 2”

Lev Reyzin (PI)�Professor of Mathematics Statistics, and Computer Science University of Illinois at Chicago

64 of 113

Highlights and Organizing Faculty

UIC Phase I found novel ways to promote data science and transdisciplinary collaboration locally, within one university.

  • creation of data science BS degree
  • new cross-departmental seminars
  • 3 long-term visitors
  • workshops for high schoolers
  • strong growth in data science
  • >20 funded students + 1 postdoc
  • multiple research breakthroughs

Natasha Devroye

ECE, CoPI

Will Perkins

MSCS, CoPI

Tasos Sidiropolous

CS, CoPI

Elena Zheleva

CS, CoPI

Institute Director

Lev Reyzin

MSCS, PI

65 of 113

TRIPODS Phase II: (UIC) + (NU + TTI-C + UC) + IIT

Institute for Data, Econometrics, Algorithms and Learning (IDEAL)

2” x 2”

Lev Reyzin (Lead PI)�Professor of Mathematics Statistics, and Computer Science University of Illinois at Chicago

66 of 113

Team Composition, by Field

Math

CS/Law

EE

CS/Stat

OR/

Stat

CS/

Math

CS/Econ

Industry

Stat

CS

Econ/Stat

67 of 113

Institute Overview

The institute’s research agenda focuses on solving key foundational problems in data science, ranging from the core foundations of data science to its interfaces with other disciplines.

Through its activities the institute will broaden participation in data science locally and nationally, build a lasting research and educational infrastructure, and foster strong connections throughout Chicago.

We leverage the strong ties we’ve built among world-class research groups in core-areas of data science (CS, EE, probability, statistics) and exceptional researchers outside the traditional center of data science (econ, law, logic, OR).

Additionally, the involvement of Google Research adds to our technical strengths and will allow us to have more real-world impact with a direct connection to industry.

Institute Goals

A Comprehensive View of DS

+

68 of 113

Local Community, Regional Connections, National Impact

69 of 113

Key Initiatives

special programs

summer workshops

problem sessions

cross-institutional seminars

pre-REU workshops

teacher workshops

public lectures

exhibits at the MSI

annual meeting

industry affiliates day

weekly team meetings

cross-institution courses

undergraduate supervision

graduate fellows

postdoctoral program

visiting fellows

Research Programs

Educational Programs

Personnel

Recurring Events

70 of 113

EnCORE: The Institute for Emerging CORE Methods in Data Science

70

TRIPODS Phase II: UCLA, UCSD, U-Penn, UT-Austin

Hamed Hassani

(University of Pennsylvania)

71 of 113

EnCORE: The Institute for Emerging CORE Methods in Data Science

  • Complex & Massive Data
  • Optimization
  • Responsible Learning
  • Education & Engagement

The EnCORE vision is to transform the landscape of these four core pillars of data science.

71

EnCORE

72 of 113

  • PI/co-PIs from four different HDR TRIPODS Phase I institutes:
    • Penn Institute for Foundations of Data Science;
    • Institute on Foundations of Data Science;
    • TRIPODS Institute for Theoretical Foundations of Data Science; and
    • Topology Geometry Data Analysis (TGDA) NSF TRIPODS Center.

  • Six PI-mentored junior faculty as senior personnel from Stony Brook Univ, Harvard Univ, Purdue Univ, Syracuse Univ, and Santa Clara Univ.

  • PI/co-PIs from four universities: UCSD, UPenn, UT Austin, and UCLA.

  • Institute affiliates from many neighboring universities.

72

The Team EnCORE

73 of 113

73

EnCORE: Personnel

Chaudhuri(CS)

Chawla* (CS)

Dasgupta(CS)

Fletcher*(Stat)

Graham(Math)

Hassani (EE)

Mazumdar (EE)

Mishne* (Math/EE)

Meka (CS)

Pappas (EE)

Roth (CS)

Saha*(CS)

Sanghavi (EE)

Sarkar*(Stat)

Tchetgen(Stat)

Wang*

(CS/Math)

Ward*(Math)

55.5% representation (10/18) of underrepresented groups,

50% representation (9/18) of highly accomplished women.

Gandhi (CS)

Hashemi (EE/CS)

Gandikota (EE/CS)

74 of 113

74

EnCORE: Management & Governance

Terence Tao, Maria Klawe, Jelani Nelson

External Advisory Board

Staff Support: 50% dedicated staff support from CSE (UCSD), Scott Blair (Website Maintenance), Jocelyn Bernardo (Event Management), Communication Support (Katie Ismael).

Saura Naderi (DEI Support-50%), Thinkabit Lab has impacted 74K+ K-12 students

75 of 113

  • Complexities of Data: Challenges throughout the data life cycle due to complex characteristics of the data, and exploiting structures to overcome them.
    • Complex and Massive Data
    • Exploiting Structures
  • Optimization: Need of new theory for data-driven optimization
  • Responsibility: Societal and Ethical responsibility in data-driven decision making. Not a constraint but a consideration.
  • Domain Sciences: Applications of theory to domain sciences: neuroscience, epidemiology, material science, and economics. Further applications to ecology, HEP, climate science through partnerships with HDR institutes.

75

EnCORE: Research Themes

76 of 113

TRIPODS Phase I: UCD4IDS

HDR TRIPODS: UC Davis TETRAPODS Institute of Data Science

Naoki Saito (PI)

Professor, Department of Mathematics, UC Davis

77 of 113

Multiscale Basis Dictionaries on Higher-Order Networks

via a vertical collaboration with S. Schonsheck (postdoc) & E. Shvarts (PhD student)

Building multiscale basis dictionaries (including Haar-Walsh bases) for analyzing data recorded on edges and faces on a simplicial complex instead on nodes:

78 of 113

HDR DSC: Data Science at Engineering/Biology Interface

HDR DSC: Engaging Undergraduates in Data and Decisions Research at the Engineering/ Biology Interface

David Schmale (PI)�Professor�College of Ag & Life Sciences�Virginia Tech

79 of 113

HDR DSC: Data Science at Engineering/Biology Interface

80 of 113

DSC: SoCal Data Science

Data Science Training and Practices: Preparing a Diverse Workforce via Academic and Industrial Partnership

81 of 113

Structure of the Program

Aim: To recruit, train, and dispatch a diverse workforce of data scientists

Recruit: Students (87% women/URM) are recruited to be fellows from all three institutions:

  • 6 fellows from UCI
  • 20 fellows from CSUF
  • 6 fellows from Cypress

Train: Students take data science related courses at each institution

Research: All fellows participate in Summer Research Experience at UCI

82 of 113

Year One!

Curriculum: New courses were initiated at the three participating institutions, primarily modeled after the introductory course to data science at UCI.

Summer Bootcamp: In a span of a week, a host of technical topics and basic skills were introduced.

Summer Research: In partnership with various research entities, Fellows got engaged with an intensive 6-week research program.

Research Symposium: Fellows presented their work to faculty and students, friends and family, and community members.

Workshops: Multiple workshops, led by the PIs, were held at CSUF and Cypress College; two summer school programs for high school students were offered

83 of 113

TRIPODS Phase I: Deep and Graph Learning

NSF HDR TRIPODS Institute on the ​ Foundations of Graph and Deep Learning

Jeremias Sulam (Co-PI)

Assistant Professor

MINDS & Biomedical Engineering Department

Johns Hopkins University

84 of 113

Mission

To establish the fundamental mathematical, statistical and computational principles behind the analysis and interpretation of complex high-dimensional data.​

Faculty

Mathematics, Applied Mathematics & Statistics​

Biomedical Engineering​

Computer Science​

Electrical & Computer Engineering​

Research

  • Foundations of Graph Learning​
  • Analysis of networked dynamical systems​
  • Learning on graphs: graph signal processing, GNNs​
  • Learning of graphs: maps, metrics, distributions​
  • Statistical inference on attributed graphs​
  • Spectral geometry and statistical network analysis​

  • Foundations of Deep Learning​
  • Analysis of convergence of learning algorithms​
  • Analysis of expressivity of graph neural networks​
  • Analysis of robustness of neural networks​
  • Design of neural network architectures​
  • Implicit regularization properties of deep networks

Education and Training

  • >30 Data Science Fellows

  • Masters in Data Science: AMS, CS, MINDS, Spring 2020​
  • PhD Dissertation Awards: Fall 2018, Fall 2019, Fall 2020​
  • Seminar Series: more than 50% female and URM​
  • Winter School on Foundations of Graph and Deep Learning, Winter 2021​
  • Annual Symposia in Fall 2017, Fall 2019, Spring 2020, Fall 2020, Spring 2021​

85 of 113

NSF Institute for Data-Driven Dynamical Design (ID4)

Eric Toberer�Director of ID4�Professor of Physics�Colorado School of Mines

Jane Greenberg�Associate Director of Data Science�Professor of Information Science�Drexel U.

Steven Lopez�Associate Director of Outreach�Associate Professor of Chemistry�Northeastern U.

86 of 113

NSF Institute for Data-Driven Dynamical Design (ID4)

ID4 develops new use-inspired machine learning solutions for addressing outstanding challenges in materials and structures for energy and sustainability.

Cross-cutting these challenges is a need to efficiently understand, predict, and control the collective dynamics of complex systems in high dimensions.

87 of 113

ID4: Domains at the tipping point

Ion transport

  • Hydrogen motion in fuel cells and electrolyzers
  • Complex, correlated atomic motion

Structural metamaterials

  • Environmentally responsive structures
  • Sustainable construction

Photocatalysis

  • Catalysts without critical materials
  • Many-body quantum mechanics

Porous frameworks for gas separation

  • CO2 adsorption and capture
  • Metal-organic framework synthesis

atomistic

aperiodic

assembly

stochastic

continuum

crystalline

application

deterministic

Diversity of Dynamical Phenomena

88 of 113

ID4: Accelerating design and creating a community

Building STEM talent and engaging a wider audience:

  • Girls Who Code & related K-12 camps
  • Research for high school and undergraduate students
  • Postbaccalaureate bridge program
  • Visiting fellows program
  • Virtual research for CC students
  • Workshops to unite the HDR community

Innovating for the future:

ID4 transfers foundational advances in computer science and statistics into open-source, user-friendly tools for practitioners in the physical sciences and engineering. Examples include JAX – FDM and Allegro

Next HDR-wide meeting in Oct 2023, likely at CSM in Golden CO

Data-focused meeting in May 2023 at Drexel U.

89 of 113

ID4: Accelerating design and creating a community

The interesting part: What challenges do we need help on?!

-Easy access to flexible, cohesive training at intersection of science/data science.

-New algorithms for dynamical systems, dimensional reduction

-Rich systems with so many analysis opportunities

-Automated/accelerated experiment

-Hiring

-Code development, use, and dissemination

-Long tail of data generation; associated metadata

-FAIR

-Data waste, missing data

-Connecting REU students across the nation!!!

https://tinyurl.com/datafuntimes https://www.mines.edu/id4/

90 of 113

ID4: How to get engaged with ID4?

  • Partnership-driven supplement request
  • Capstone projects
  • Postbacc Fellow program
  • Visiting Fellow program (travel funds)
  • Meeting co-organization

https://tinyurl.com/datafuntimes https://www.mines.edu/id4/

91 of 113

Institute for Geospatial Understanding through an Integrative Discovery Environment (I-GUIDE)

Shaowen Wang, PI and Director, University of Illinois Urbana-Champaign (UIUC)

Anand Padmanabhan, Managing Director, UIUC

X. Carol Song, Co-PI

Purdue University

Mark Daniel Ward, SP

Purdue University

92 of 113

Leadership Team

93 of 113

Vision and Mission

  • Discovery through digital and location connections
  • Harnessing the geospatial data revolution for sustainability solutions
  • Map, Connect, Discover

94 of 113

95 of 113

Convergence Curriculum for Geospatial Data Science

  • Support multiple learner pathways

96 of 113

HDR DSC: National Data Mine Network

Mark Daniel Ward�PI�Director of The Data Mine�American Statistical Association

97 of 113

98 of 113

99 of 113

100 of 113

Goal: Create a model for increasing access to data science training for students form historically marginalized groups.

  • New pathways:
    • 3+2 program for students in STEM, BS STEM Spelman + MS DS MSU
    • Minor in data science at Spelman
  • Innovations:
    • Summer bridge program for students tuned for Spelman students needs à MSU
    • Developing team taught courses at Spelman and MSU that involve both sides
    • 12 credits of lab research for 3+2 students
    • Training faculty at Spelman in data science commentating with faculty research experience and new course materials
    • Training MSU faculty in what it is work with students who are coming for a historically marginalized group
  • Current progress
    • First co-hort of students
    • First co-hort of faculty

101 of 113

Contact: matteson@cornell.edu

102 of 113

103 of 113

104 of 113

Special diet meals are plated.

Please talk to the hotel staff for special requests.

105 of 113

Dear Colleague Letter: Reproducibility and Replicability in Science

October 25, 2022

Dear Colleagues:

A 2019 consensus study report published by the National Academies of Sciences, Engineering, and Medicine (NASEM) discussed the meaning of the terms replicability and reproducibility and identified approaches for researchers, academic institutions, journals, and funders to improve reproducibility and replicability in science [1]. In July 2021, at NSF's request, NASEM convened an expert meeting focused on National Science Foundation (NSF) policies and investments to make reproducible and replicable science easier for scientific communities to understand and execute and to embed reproducibility and replicability within the fundamental scientific method.

Through this Dear Colleague Letter (DCL), NSF reaffirms its commitment to advancing reproducibility and replicability in science. NSF is particularly interested in proposals addressing one or more of the following topics:

  1. Advancing the science of reproducibility and replicability.
  2. Research infrastructure for reproducibility and replicability.
  3. Educational efforts to build a scientific culture that supports reproducibility and replicability.

106 of 113

End of Lightning talk

107 of 113

Code of conduct

We are dedicated to providing a welcoming, supportive and inclusive environment for all people, regardless of background and identity. We do not tolerate discrimination or harassment of any kind. Any form of behavior to exclude, intimidate, or cause discomfort is a violation of the Code of Conduct. By participating in this community, participants accept to abide by the eScience Code of Conduct and accept the procedures by which any Code of Conduct incidents are resolved.

108 of 113

Welcome

109 of 113

Template (please DO NOT modify following three pages

110 of 113

Award type: short title of your award

Full title of your award

2” x 2”

Full name�Your position in the award�Your position in the institution�Institution name

111 of 113

Content slide 1

112 of 113

Content slide 2

113 of 113

Add your slides after this page

(ordering your slide by speaker’s last name)