1 of 43

Welcome to NCSSM-Morganton!

Data Science Summer Institute 2023

Taylor Gibson�Dean of Data Science and �Interdisciplinary Initiatives

gibson@ncssm.edu

2 of 43

I’m the Dean of Data Science at NCSSM.

My background is in teaching math & computer science. I also have a degree in biomedical engineering with a focus in neuroengineering 🧠

I worked to develop the data science program (with a lot of help) at NCSSM.

I also maintain NCSSM's Jupyter cloud infrastructure.

You can reach me at gibson@ncssm.edu

Hello, my name is Taylor Gibson 👋

3 of 43

Goals for the week

  • Learn about data science. �We have great experts here to help�
  • Do data science.Solving problems with hands on activities.�
  • Network with colleagues. �We've got some amazing participants here!�
  • Have fun! �Self explanatory.

4 of 43

Who is in the room?

  • What grades do you teach?Middle school (6-8), high school (9-12), college (13-16), grad school�
  • What subject(s) do you teach? �Math/statistics, computer science, lab sciences, humanities, other�
  • Programming background? �None at all, some but a long time ago, pretty decent, software developer�
  • Data science / statistics background? �None at all, some but a long time ago, pretty decent, Hadley Wickham

5 of 43

6 of 43

Plan for the week

Monday

Meet one another, collect some data, get settled!

Tuesday�-�Thursday

Mornings Complete classroom-type activities�Afternoons Guest speakers from all over!

Happy Hour Reception: Wednesday @ Fonta Flora

Friday

National Landscape of Data Science Education�Data Science in Industry�Wrapping up

7 of 43

Learning outcomes

  • What is data science? With data science becoming the latest buzzword, it's important to be on the same page for what we mean by data science.
  • Why teach data science? The reasoning why we have chosen to teach data science at NCSSM and make it a focus of NCSSM-Morganton.
  • How to teach data science? Provide students the opportunity to work with meaningful data and space to explore and explain what they discover.
  • Who teaches data science? Data science will require instructors to have foundational knowledge of statistics and programming to help solve problems in a variety of contexts including the sciences and humanities.
  • What tools do instructors and students need to know? Data science often requires working with large datasets, and modern computational tools are an efficient way to do so.

8 of 43

There are a lot of thoughts and opinions on the matter

What is data science?

9 of 43

10 of 43

11 of 43

12 of 43

13 of 43

14 of 43

So, really, what is data science?

15 of 43

Data science

Applications

Implications

Foundations

16 of 43

Updated from Grolemund & Wickham's classic R4DS schematic, envisioned by Dr. Julia Lowndes for her 2019 use R! keynote talk and illustrated by Allison Horst.

17 of 43

Data collection and exploration

Collecting data from the field or using publicly available datasets.

Cleaning and wrangling data so it is properly formatted to facilitate analysis.

Combining multiple datasets into one.

Visualizing data to discover patterns and produce hypotheses.

Artwork by @allison_horst

18 of 43

Inference and simulation

Quantify if your result is significant or more likely to be due to random chance.

Perform hypothesis testing.

The primary tool we have is randomization, which programming can make trivial.

Artwork by @allison_horst

19 of 43

Prediction and classification

Making informed, quantitative guesses.

Techniques can include regression and classification.

Introduce students to a discipline called machine learning.

Artwork by @allison_horst

20 of 43

It all comes down to scale.

How is data science different than statistics?

21 of 43

Open science and reproducible research

Scientific results and evidence are strengthened if those results can be replicated and confirmed by several independent researchers.

When researchers properly document and share the data and processes associated with their analyses - the broader research community is able to save valuable time when reproducing or building upon published results.

22 of 43

23 of 43

Examples

24 of 43

Tools are necessary, but by themselves are not data science

Tools of data science

25 of 43

Computational tools

Single Purpose Flexible / Multipurpose

26 of 43

There are many options!

Data science curriculum

27 of 43

High school data science curricula

All great choices with similar learning outcomes, but use different tools, and target different audiences

28 of 43

Working with data transcends disciplines and courses

But, it's more than just a single course

29 of 43

Where else to teach data science?

Math 3

30 of 43

Where else to teach data science?

AP Statistics

Post-exam module to utilize larger real-world datasets��Lab Sciences

Plotting results collected in a chemistry or biology lab.

Performing hypothesis testing using simulation

Engineering / CTE courses

Compare effects of design choices of a paper helicopter

31 of 43

You're already on your way…

How do I learn to teach data science?

32 of 43

Learning the content, tools, and pedagogy 🧑‍🏫

NC State University: InSTEP (Virtual)�Free personalized professional learning to support teachers and instructional coaches in developing expertise in teaching K-12 statistics and data science.

33 of 43

Questions?

34 of 43

Activity

35 of 43

Wrangling and Tidying Data

36 of 43

Tables and "Tidy Data"

Source�Wickham, Hadley (20 February 2013). "Tidy Data" (PDF). Journal of Statistical Software.

Artwork by @allison_horst

37 of 43

Tables and "Tidy Data"

Artwork by @allison_horst

38 of 43

Tables and "Tidy Data"

Artwork by @allison_horst

39 of 43

Tables and "Tidy Data"

Artwork by @allison_horst

40 of 43

Tables and "Tidy Data"

Artwork by @allison_horst

41 of 43

Tables and "Tidy Data"

Artwork by @allison_horst

42 of 43

Tables and "Tidy Data"

Artwork by @allison_horst

43 of 43

Tables and "Tidy Data"

  • A Table is a sequence of labeled columns
  • Each row represents one individual
  • Data within a column represents one attribute of the individuals and should be of the same type

Name

Code

Area (m2)

California

CA

163696

Nevada

NV

110567

Label

Column

Row