HDR2
�from Harnessing the Data Revolution to Harvesting the Data Revolution
NSF HDR PI Meeting��Oct 26-27 2022
Lightning talk
Fill out the single shared slide deck for your presentations next week. One presentation per award
Oct 26 Wed (Last name A~J)
DSC-WAV: Wrangle, Analyze,Visualize
Valerie Barr (co-PI)�Margaret Hamilton Distinguished Professor of Computer Science�Director of the Bard Network Computing Initiative�Bard College
Student teams collaborate with local community organizations to address a data science question through a semester-long project
DSC-WAV leadership working to establish formal, transparent pathways to four year programs from community colleges:
imageomics.org
imageomics.org
Tripods Phase I: Tufts Tripods
HDR Tripods: Building the Foundation for a Data Intensive Study Center
Lenore Cowen (PI)
Professor, Computer Science
Professor, Math & GSBS
Tufts University
T-Tripods supports Interdisciplinary Research in the Foundations of Data Science
3 Foundation Research Foci:
RF1: Y1-4: Graphs and Tensors
RF2:Y2–4: Spatial/Temporal Data
RF3:Y3-4: Data Guarantees: Quality, Transparency,Privacy,
Fairness,Trust
4 application area domains
T-Tripods also connects research to education and broadening participation in Data Science
Graduate:
Interdisciplinary Graduate Training: Advising Trio model
Undergraduates:
DIAMONDS: Directed Intensive and Mentored Opportunities in Data Science: Broadening participation for all
DIAMONDS REU: summer 2022
HDR DSC: AI Across the Statewide Curriculum
Jennifer Drew (PI)
Senior Lecturer
Microbiology and Cell Science
College of Agriculture and Life Sciences
Satyanarayan Dev (Co-PI)
Associate Professor and Chair
Biological Systems Engineering
College of Agriculture and Food Sciences
Goal: 3-year program to build a diverse and skilled AI workforce by enhancing the reach and impact of undergraduate AI curriculum
Data Science Corps: Connecting the Dots
HDR DSC: Collaborative Research: Connecting the Dots
2” x 2”
Jeffrey R. Errington, PI�Professor and Associate Dean�University at Buffalo
TRIPODS Phase I: UMASS TRIPODS
TRIPODS Institute for Theoretical Foundations of Data Science
Patrick Flaherty, co-PI�Associate Professor of Mathematics & Statistics�UMass Amherst
UMass TRIPODS Accomplishments
DSC: Central Coast Data Science Partnership
Training a New Generation of Data Scientists
2” x 2”
Alexander Franks�co-PI
Assistant Professor, Statistics
University of California, Santa Barbara
Central Coast Data Science Partnership
(HDR DSC Awards #1924205 & #1924008)
Central Coast Data Science Partnership
New Courses
Capstone Projects
Summer Research Experience
Data Science Fellows
Interns and Fellows serve as “ambassadors” of data science at their schools
DSC: DaMADScience Corps
The DelAware And MiD-Atlantic Science Corps
Jing Gao (co-PI)
Assistant Professor of Geospatial Data Science
University of Delaware
DaMADScience Corps: The DelAware And MiD-Atlantic Science Corps
PI Bianco UD CoPIs: Gao, Dobler LU CoPI: Tamez DSU CoPI: Boukari
A partnership between University of Delaware (UD), Lincoln University (LU), Delaware State University (DSU) to create an equitable, accessible program for data science education
We are building an joint educational program of bootcamps, courses, hackathons, and research that�
TRIPODS Phase I: IDEAL
Institute for Data, Econometrics, Algorithms, and Learning
Varun Gupta (co-PI)�Associate Professor of Operations Management The University of Chicago Booth School of Business
Goal: Cross-campus and cross-disciplinary collaborations
| CS | Stats | Econ | OR | EE | Law |
| | | | | | |
| | | | | | |
| | | | | | |
Research foci:
HDR Institute: A3D3
Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery
Shih-Chieh Hsu
Director
Associate Professor, Physics
University of Washington
Phil Harris
Deputy Director
Assistant Professor, Physics
Massachusett Institute Technology
Mark Neubauer
Community Engagement Coordinator
Professor, Physics
Affiliate Professor, ECE and NCSA
University of Illinois Urbana-Champaign
NSF HDR Institute A3D3: �Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery
Our vision is to establish a tightly coupled organization of domain scientists, computer scientists, and engineers that unite three core components which are essential to achieve real-time AI to transform science and engineering discoveries.
9 institutions, 17 senior personnels, 2 research scientists�10 postdocs, 15 graduates, 12 undergraduates, 4 postbac
Research, Education, Community Engagement to to push the boundaries of data processing beyond industry technologies
SONIC
PyLog
High Energy Physics
Multi-messenger Astrophysics
Hardware-Algorithm co-development
Neuroscience
Community Engagement
iHARP: NSF HDR Institute for Harnessing Data and Model Revolution in the Polar Regions
iHARP Focus Area Leadership Team
Vandana Janeja
(Director)
UMBC
Mathieu Morlighem
(Co-Director)
Dartmouth
Jianwu Wang (Co-PI)
UMBC
Aneesh Subramanian
(Co-PI)
CU Boulder
Shashi Shekhar (Co-PI)
UMN
Data Science and Polar Science- A Continuum
with more realistic physical relationships
the future mass balance of Ice Sheets,
sea level rise and climate change
iHARP Vision
iHARP advances our understanding of the response of polar regions to climate change and its global impacts by deeply integrating data science and polar science to spur physics-informed, data-driven discoveries.
Spatio-temporal Mining
Computer Vision
Prediction and Causal Inference
Scalability
Education and Outreach
Lenaerts et al. 2019
HDR DSC: Data Science for Energy Transition
2” x 2”
Mikyoung Jun, PI
Professor of Mathematics & ConocoPhillips Data Science Professor
University of Houston
HDR DSC: Data Science for Energy Transition
●A multi-institutional team of five universities (UH, UHD, UHV, UHCL, SHSU) in greater Houston region, in partnership with multiple energy related industries
●PI/Co-PI/Sr Personnel team consists of experts from statistics, computer science, engineering, geoscience, public and energy policy
●Each year, the program will produce about 40 students (undergraduate and Master) from diverse background trained in data science skills essential for energy industry, as well as statistics, computer science, geoscience, and public policy
●Program consists of 5 weeks summer camp/research team projects/summer internship
DSC: Interdisciplinary Traineeship for Socially Responsible
and Engaged Data Scientists
(iTREDS)
Thomas Mustillo (Co-PI)�Associate Professor�Keough School of Global Affairs�University of Notre Dame
Kristin Kuter (Co-PI)�Associate Professor and Chair�Mathematics and Computer Science�Saint Mary’s College
With: �Nitesh Chawla (PI), CS; Ann Marie Conrado, Design; Don Howard, Philosophy; Ron Metoyer, CS; Ewa Misiolek, Math & CS; Danielle Wood, Planning; Chris Wedrychowicz, Math & CS
The T-Shaped iTREDS Scholar: �Breadth in Profession Superskills; Depth in Data Acumen
Vision: A university education develops� undergraduate students who can see � the implications of their work to � society.
Construct: An interdisciplinary and experiential � learning program for students � working together with � stakeholders on data-driven � problems.
Contribution: Instill a mindset at the intersection of � a data-centered and human- � centered approach for a “21st � century data-capable workforce.”
The iTREDS Curriculum & Student Profile
TRIPODS Phase I: Rutgers DIMACS
Post-Doctoral Associates
Center for Discrete Math and Theoretical Computer Science (DIMACS)
Rutgers University
Ewerton Vieira
Cameron Thieme
DATA-INSPIRE: “DATA science for INtelligent Systems and People Interaction that integrates Research and Education Activities”
This institute is premised on our belief that advances in data science principles are needed to impact the emerging paradigm of intelligent machines and their convergence with human society. This foundational understanding is needed to further improve the performance and better explain the operation of such machines so they can accomplish diverse, real-world tasks and interact effectively with people.
42
14 faculty:
4 CS
4 Math
4 Stat
2 PostDoc
End of lightning talk session 1
Note: When leaving please use double doors in back
We will start again at 1:30
Oct 27 Thursday (Last name L~Z)
HDR DSC: The MCDC
The Metropolitan Chicago Data science Corps: Learning from Data to Support Communities
Suzan van der Lee�Principal Investigator�Professor�Northwestern University
Michelle Birkett, Mark Potosnak, Eunice Santos, Pascal Paschos, Nadja Insel, Yoo-Seong Song, Arend Kuyper, Francisco Iacobelli, Sara Woods, Denise Drane, Bennett Goldberg, Matthew Sperry
Community partners
NSF DSC: DS-PATH
Data Science Career Pathways in the California Inland Empire
Paea LePendu (co-PI)
Assistant Professor of Teaching
Computer Science & Engineering
UC Riverside
Mariam Salloum (PI)
Assistant Professor of Teaching
Computer Science & Engineering
UC Riverside
NSF DSC: DS-PATH
Data Science Career Pathways in the California Inland Empire
NEW: grades 6-12
DS curricula, teacher training, outreach++
DSC: Collaborative Research: Transforming Data Science Education through a Portable and Sustainable Anthropocentric Data Analytics for Community Enrichment (ADACE) Program
Yu Liang (PI)
Prof. of Computer Science,
UTC
CS+EE+Math+Biology+Chem+MD+CivilE+Sociology
Objective: establishing a community-engaged, multidisciplinary education and research program for anthropocentric data analytics.
Topics of Anthropocentric Data Analytics:
Accomplishments
Part of participants of ADACE Workshop 2022
Stakeholders of ADACE
Life cycle of Anthropocentric Data Analytics
Representative ADACE research projects
DSC: Community-centered DS for Engineering Students
Collaborative Research: Infusion of data science and computation into engineering curricula (2021)
Dr. Wesley Reinhart, Co-PI�Asst Prof, Materials Science�Penn State University
Dr. Rebecca Napolitano, PI�Asst Prof, Architectural Engineering�Penn State University
DSC: Community-centered DS for Engineering Students
This material is based upon work supported by the National Science Foundation under Grant IIS-2123343.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
How do we provide engineering students the opportunity to learn DS using community-data and real world problems?
Collaborative Research: HDR-DSC: Building Capacity in Data Science through Biodiversity, Conservation, and General Education
2” x 2”
Kathleen L. Prudic, PhD (lead PI)�Assistant Professor School of Natural Resources and the Environment�University of Arizona
Lewis and Clark College
Greta Binford, PhD (lead co-PI)
University of Arizona
Build data science capacity through General Education curriculum
Using interest in conservation to create more on ramps for data science training and professional development
Dealing with Data in the Wild
(Intro Data Science)
Freshman/Sophomores
General Education Course
Applied Data Science
(Adv Intro Data Science)
Sophomore +
InterInstitutional, Collaborative Project-Based Course with USFWS
Instructor Training
(Intro Data Science and Pedagogy)
Life Science Undergraduates, Grad Students, Post-Docs, and Faculty
DSC: Earth Data Science Corps (EDSC)
HDR DSC: Earth Data Science Corps - Fulfilling Workforce Demand at the Intersection of Environmental Science and Data Science
Nathan Anderson Quarderer (current co-PI)�Postdoctoral Associate; Interim Education Director Earth Lab/ESIIL; CU Boulder/CIRES
Jennifer Balch; Director Earth Lab/ ESIIL (current PI; former co-PI)
Leah Wasser; pyOpenSci
(former PI)
DSC: Earth Data Science Corps (EDSC)
DSC: Earth Data Science Corps (EDSC)
HDR TRIPODS I: D4 Institute
D4 (Dependable Data-Driven Discovery) Institute
Hridesh Rajan (PI)�Professor and Chair of Computer Science at Iowa State University
hridesh@iastate.edu
Breno Dantas Cruz�Postdoctoral Fellow Computer Science at Iowa State University
bdantasc@iastate.edu
D4 Team
Theoretical and Applied Data Science Initiative
kielkopf@iastate.edu
D4 Institute: Goals
DS Lifecycle
Diagnosing Faults in Deep Learning
4
model = Sequential ()
model.add(Dense(784, input_shape=(784,)))
model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation(‘sigmoid'))
model.compile(loss='categorical_crossentropy, optimizer=SGD())
model.fit(X, Y, batch_size=batch_size, epochs=epoch)
DL faulty patterns have to be detected at training.
Problem Statement
Illustrative Example
Faulty behavior diagnosis
Suggest fix to model source code
DNN model training
Key Insights
Link model fault trends to problems in the DNN source code.
Analyze trends during DL model training to identify indicate faults.
Traditional software test suites are not applicable to DL-based software.
1
2
The map can be extended to link fault trends to problems in the DNN source code.
Each DL model is unique (e.g., architecture and data).
Trend analysis of model metrics
3
weights
Faulty pattern
TRIPODS Phase I: UIC
HDR TRIPODS: UIC Foundations of Data Science Institute
2” x 2”
Lev Reyzin (PI)�Professor of Mathematics Statistics, and Computer Science University of Illinois at Chicago
Highlights and Organizing Faculty
UIC Phase I found novel ways to promote data science and transdisciplinary collaboration locally, within one university.
Natasha Devroye
ECE, CoPI
Will Perkins
MSCS, CoPI
Tasos Sidiropolous
CS, CoPI
Elena Zheleva
CS, CoPI
Institute Director
Lev Reyzin
MSCS, PI
TRIPODS Phase II: (UIC) + (NU + TTI-C + UC) + IIT
Institute for Data, Econometrics, Algorithms and Learning (IDEAL)
2” x 2”
Lev Reyzin (Lead PI)�Professor of Mathematics Statistics, and Computer Science University of Illinois at Chicago
Team Composition, by Field
Math
CS/Law
EE
CS/Stat
OR/
Stat
CS/
Math
CS/Econ
Industry
Stat
CS
Econ/Stat
Institute Overview
The institute’s research agenda focuses on solving key foundational problems in data science, ranging from the core foundations of data science to its interfaces with other disciplines.
Through its activities the institute will broaden participation in data science locally and nationally, build a lasting research and educational infrastructure, and foster strong connections throughout Chicago.
We leverage the strong ties we’ve built among world-class research groups in core-areas of data science (CS, EE, probability, statistics) and exceptional researchers outside the traditional center of data science (econ, law, logic, OR).
Additionally, the involvement of Google Research adds to our technical strengths and will allow us to have more real-world impact with a direct connection to industry.
Institute Goals
A Comprehensive View of DS
+
Local Community, Regional Connections, National Impact
Key Initiatives
special programs
summer workshops
problem sessions
cross-institutional seminars
pre-REU workshops
teacher workshops
public lectures
exhibits at the MSI
annual meeting
industry affiliates day
weekly team meetings
cross-institution courses
undergraduate supervision
graduate fellows
postdoctoral program
visiting fellows
Research Programs
Educational Programs
Personnel
Recurring Events
EnCORE: The Institute for Emerging CORE Methods in Data Science
70
TRIPODS Phase II: UCLA, UCSD, U-Penn, UT-Austin
Hamed Hassani
(University of Pennsylvania)
EnCORE: The Institute for Emerging CORE Methods in Data Science
The EnCORE vision is to transform the landscape of these four core pillars of data science.
71
EnCORE
72
The Team EnCORE
73
EnCORE: Personnel
Chaudhuri(CS)
Chawla* (CS)
Dasgupta(CS)
Fletcher*(Stat)
•
Graham(Math)
Hassani (EE)
Mazumdar (EE)
Mishne* (Math/EE)
Meka (CS)
Pappas (EE)
Roth (CS)
Saha*(CS)
Sanghavi (EE)
Sarkar*(Stat)
Tchetgen(Stat)
Wang*
(CS/Math)
Ward*(Math)
55.5% representation (10/18) of underrepresented groups,
50% representation (9/18) of highly accomplished women.
Gandhi (CS)
Hashemi (EE/CS)
Gandikota (EE/CS)
74
EnCORE: Management & Governance
Terence Tao, Maria Klawe, Jelani Nelson
External Advisory Board
Staff Support: 50% dedicated staff support from CSE (UCSD), Scott Blair (Website Maintenance), Jocelyn Bernardo (Event Management), Communication Support (Katie Ismael).
Saura Naderi (DEI Support-50%), Thinkabit Lab has impacted 74K+ K-12 students
75
EnCORE: Research Themes
•
TRIPODS Phase I: UCD4IDS
HDR TRIPODS: UC Davis TETRAPODS Institute of Data Science
Naoki Saito (PI)
Professor, Department of Mathematics, UC Davis
Multiscale Basis Dictionaries on Higher-Order Networks
via a vertical collaboration with S. Schonsheck (postdoc) & E. Shvarts (PhD student)
Building multiscale basis dictionaries (including Haar-Walsh bases) for analyzing data recorded on edges and faces on a simplicial complex instead on nodes:
HDR DSC: Data Science at Engineering/Biology Interface
HDR DSC: Engaging Undergraduates in Data and Decisions Research at the Engineering/ Biology Interface
David Schmale (PI)�Professor�College of Ag & Life Sciences�Virginia Tech
HDR DSC: Data Science at Engineering/Biology Interface
DSC: SoCal Data Science
Data Science Training and Practices: Preparing a Diverse Workforce via Academic and Industrial Partnership
Structure of the Program
Aim: To recruit, train, and dispatch a diverse workforce of data scientists
Recruit: Students (87% women/URM) are recruited to be fellows from all three institutions:
Train: Students take data science related courses at each institution
Research: All fellows participate in Summer Research Experience at UCI
Year One!
Curriculum: New courses were initiated at the three participating institutions, primarily modeled after the introductory course to data science at UCI.
Summer Bootcamp: In a span of a week, a host of technical topics and basic skills were introduced.
Summer Research: In partnership with various research entities, Fellows got engaged with an intensive 6-week research program.
Research Symposium: Fellows presented their work to faculty and students, friends and family, and community members.
Workshops: Multiple workshops, led by the PIs, were held at CSUF and Cypress College; two summer school programs for high school students were offered
TRIPODS Phase I: Deep and Graph Learning
NSF HDR TRIPODS Institute on the Foundations of Graph and Deep Learning
Jeremias Sulam (Co-PI)
Assistant Professor
MINDS & Biomedical Engineering Department
Johns Hopkins University
Mission
To establish the fundamental mathematical, statistical and computational principles behind the analysis and interpretation of complex high-dimensional data.
Faculty
Mathematics, Applied Mathematics & Statistics
Biomedical Engineering
Computer Science
Electrical & Computer Engineering
Research
Education and Training
NSF Institute for Data-Driven Dynamical Design (ID4)
Eric Toberer�Director of ID4�Professor of Physics�Colorado School of Mines
Jane Greenberg�Associate Director of Data Science�Professor of Information Science�Drexel U.
Steven Lopez�Associate Director of Outreach�Associate Professor of Chemistry�Northeastern U.
NSF Institute for Data-Driven Dynamical Design (ID4)
ID4 develops new use-inspired machine learning solutions for addressing outstanding challenges in materials and structures for energy and sustainability.
Cross-cutting these challenges is a need to efficiently understand, predict, and control the collective dynamics of complex systems in high dimensions.
ID4: Domains at the tipping point
Ion transport
Structural metamaterials
Photocatalysis
Porous frameworks for gas separation
atomistic
aperiodic
assembly
stochastic
continuum
crystalline
application
deterministic
Diversity of Dynamical Phenomena
ID4: Accelerating design and creating a community
Building STEM talent and engaging a wider audience:
Innovating for the future:
ID4 transfers foundational advances in computer science and statistics into open-source, user-friendly tools for practitioners in the physical sciences and engineering. Examples include JAX – FDM and Allegro
Next HDR-wide meeting in Oct 2023, likely at CSM in Golden CO
Data-focused meeting in May 2023 at Drexel U.
ID4: Accelerating design and creating a community
The interesting part: What challenges do we need help on?!
-Easy access to flexible, cohesive training at intersection of science/data science.
-New algorithms for dynamical systems, dimensional reduction
-Rich systems with so many analysis opportunities
-Automated/accelerated experiment
-Hiring
-Code development, use, and dissemination
-Long tail of data generation; associated metadata
-FAIR
-Data waste, missing data
-Connecting REU students across the nation!!!
ID4: How to get engaged with ID4?
HDR Institute: https://iguide.illinois.edu/
Institute for Geospatial Understanding through an Integrative Discovery Environment (I-GUIDE)
Shaowen Wang, PI and Director, University of Illinois Urbana-Champaign (UIUC)
Anand Padmanabhan, Managing Director, UIUC
X. Carol Song, Co-PI
Purdue University
Mark Daniel Ward, SP
Purdue University
Leadership Team
Vision and Mission
Convergence Curriculum for Geospatial Data Science
HDR DSC: National Data Mine Network
Mark Daniel Ward�PI�Director of The Data Mine�American Statistical Association
Goal: Create a model for increasing access to data science training for students form historically marginalized groups.
Contact: matteson@cornell.edu
Special diet meals are plated.
Please talk to the hotel staff for special requests.
Dear Colleague Letter: Reproducibility and Replicability in Science
October 25, 2022
Dear Colleagues:
A 2019 consensus study report published by the National Academies of Sciences, Engineering, and Medicine (NASEM) discussed the meaning of the terms replicability and reproducibility and identified approaches for researchers, academic institutions, journals, and funders to improve reproducibility and replicability in science [1]. In July 2021, at NSF's request, NASEM convened an expert meeting focused on National Science Foundation (NSF) policies and investments to make reproducible and replicable science easier for scientific communities to understand and execute and to embed reproducibility and replicability within the fundamental scientific method.
Through this Dear Colleague Letter (DCL), NSF reaffirms its commitment to advancing reproducibility and replicability in science. NSF is particularly interested in proposals addressing one or more of the following topics:
End of Lightning talk
Code of conduct
We are dedicated to providing a welcoming, supportive and inclusive environment for all people, regardless of background and identity. We do not tolerate discrimination or harassment of any kind. Any form of behavior to exclude, intimidate, or cause discomfort is a violation of the Code of Conduct. By participating in this community, participants accept to abide by the eScience Code of Conduct and accept the procedures by which any Code of Conduct incidents are resolved.
Welcome
Template (please DO NOT modify following three pages
Award type: short title of your award
Full title of your award
2” x 2”
Full name�Your position in the award�Your position in the institution�Institution name
Content slide 1
Content slide 2
Add your slides after this page
(ordering your slide by speaker’s last name)