PART 2
Computational & Statistical Thinking
Mathematical Foundation
KU PSYC 500
Jiin Jung
Course Objectives
Specifically, at the conclusion of the course,
1.Students will be able to:
Open source data
Datasets are given by organizations
Subsetting
Slicing
Vectorized operation
Reshaping
Combining
Data visualization
Descriptive statistics
Suggesting hypotheses
Accessing assumptions for inferential statistics
Simulation-based statistics
Why Simulation?
“Resampling techniques are rapidly entering mainstream data analysis; some statisticians believe that resampling procedures will soon supplant common nonparametric procedures and may displace most parametric procedures as well. “ (Berger, 2012)
Simulation
Generate a simulated dataset
A Replicate
Calculate a replicate from the simulated dataset
Iteration
Process
(10k~)
Part 2
Computational & Statistical Thinking
Mathematical Foundation
Data Ethics. NumPy
Data Ethics.
Iteration. For loop. Defining functions.
Week 5
Define Functions. Simulation
Iteration. For loop. Defining functions. Cumulative distribution function (CDF). Random number generation
>>> 2nd Project Announcement
Week 6
2nd Project Presentation
Lied Center of Art.
Week 8
Statistical Inference
Exploratory data analysis (EDA). Statistical Inference. Probability and Uncertainty. Bernoulli Trials. Binomial Distribution.
Week 7
Questions?
Intro to NumPy
Lec 11
Objectives
NumPy
Numpy is a library for the Python programming language.
Essential libraries and projects that depend on NumPy’s API gain access to new array implementations that support NumPy’s array protocols (Fig. 3).
Import NumPy
import numpy as np
Array: a basic data structure of Numpy
np.array()
ndarray
ndarray: a n-dimensional array
Sorting & adding elements
np.sort()
np.append()
np.concatenate()
Indexing & slicing elements - 1d array
data = np.array([1,2,3])
Indexing & slicing elements - 2d array
data = np.array([[1,2],[3,4],[5,6]])
Selecting a subset
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a[3:8]#you can use index positions.
c = a[a > 5] #you can use booleans.
Array operations
a = np.array([1,2,3])
b = np.array([4,5,6])
a + b
a - b
a * b
a / b
Math formulas & summary statistics
np.square()
np.sum()
np.mean()
np.var()
np.std()
np.percentile()
np.corrcoef()
Questions?