Welcome to Data Science DISCOVERY!
Karle Flanagan
Wade Fagen-Ulmschneider
August 26, 2019
Introductions
No good party starts without introductions...
Introductions:
Karle Flanagan (kflan)
Senior Instructor of Statistics
College of Liberal Arts and Sciences�kmeanskarle
Wade Fagen-Ulmschneider (waf)
Teaching Associate Prof. of Computer Science
Grainger College of Engineering
profwade
Introductions: You!
“Hello” Survey
What to expect this semester?
Inferential Thinking (Statistics)
Inferential Thinking (Statistics)
Computation
Inferential Thinking (Statistics)
Computation
Business�Intelligence
Inferential Thinking (Statistics)
Computation
Business Intelligence
Data Science
What data is most important to many college students?
CRN,Course Subject,Course Number,Course Title,Course Section,Sched Type
41758,AAS,100,Intro Asian American Studies,AD1,DIS
47100,AAS,100,Intro Asian American Studies,AD2,DIS
47102,AAS,100,Intro Asian American Studies,AD3,DIS
51248,AAS,100,Intro Asian American Studies,AD4,DIS
51249,AAS,100,Intro Asian American Studies,AD5,DIS
CRN,Course Subject,Course Number,Course Title,Course Section,Sched Type
41758,AAS,100,Intro Asian American Studies,AD1,DIS
47100,AAS,100,Intro Asian American Studies,AD2,DIS
47102,AAS,100,Intro Asian American Studies,AD3,DIS
51248,AAS,100,Intro Asian American Studies,AD4,DIS
51249,AAS,100,Intro Asian American Studies,AD5,DIS
Data Science Visualization Created at Illinois:� Grade disparity between sections at UIUC
Data Science Tool: Jupyter
Interactive programming tool, optimized for programming data science-type questions.
Main benefit “jupyter notebooks” that allow us to program and document our process in the same interactive file:
Data Science Tool: Pandas
Python Data Analysis Library:
�
At its core, pandas provide Excel-like access to datasets. Can quickly become much more power than Excel, particularly for large datasets.
Data Science Knowledge: Statistics
How can we know if A is better than B?
How can we compare A with B?
How do we do this under uncertainty?
Course Resources
Course Resources
1. Course Website: http://courses.las.illinois.edu/fall2019/stat107/
Course Resources
2. Piazza: http://piazza.com/illinois/fall2019/stat107
Course Resources
3. Compass 2g: https://compass2g.illinois.edu
Important Things to Know: Course Staff
Teaching Assistants (TA)
Important Things to Know: Course Staff
Course Aides (CAs):
There are a lot of people here to help you succeed in Data Science!! If you get stuck, reach out and learn something! :)
Important Things to Know: Office Hours
Professor Office Hours:
Open Office Hours:
Important Things to Know: Lecture
2. Lecture Format
3. Lecture Seating
4. Lecture Behavior (is attendance mandatory?)
Important Things to Know: Lab
Important Things to Know: Grading
Course grades are given in points, there are a total of 1,000 points throughout the semester:
Labs (180 points)� + Homework (275 points)� + Project (145 points)� + Exams (400 points)� = 1,000 points�
Important Things to Know: Grading
Important Things to Know: Grading
Course grades are given in points, there are a total of 1,000 points throughout the semester:
Labs (180 points)� + Homework (275 points)� + Project (145 points)� + Exams (400 points)� = 1,000 points�
Important Things to Know: Grading
Course grades are given in points, there are a total of 1,000 points throughout the semester:
Labs (180 points)� + Homework (275 points)� + Project (145 points)� + Exams (400 points)� = 1,000 points + 107 Extra Credit Points (max)� = 1,107 possible points!!!�
Important Things to Know: Extra Credit
You can earn +1 extra credit point every lecture:
Details: We will ask iClicker questions throughout each class starting this Wednesday. You will get +1 if you answer all of the questions that day. Register your iClicker on Compass 2g ASAP!
Important Things to Know: Extra Credit
You can earn +1 extra credit point after every lecture:
2. Extra Credit (EC) Python Notebooks:
Details: After each lecture, there will be a short Jupyter notebook available to practice and apply the programming techniques that you learned in class for +1. Each notebook is due at the start of the next lecture (12:00noon).
Important Things to Know: Extra Credit
You can earn +1 extra credit point most weeks:
3. Course-wide Surveys:
Details: Throughout the semester, we will have extra credit surveys available for you to complete! We will analyze this data together in class :)
Important Things to Know: Grading
See the course website for details on the following topics:
Important Things to Know: Exams
All exams (including the final exam) will take place in the Computer Based Testing Facility (CBTF):
Data Science Tools
Data Science Tools
There are two broad categories of data:�
Structured Data�and�Unstructured Data
Data Science Tools
Structured data refers to data that has been organized and categorized in a well-defined format.
Data Science Tools
Structured data refers to data that has been organized and categorized in a well-defined format, or can easily be organized in such a way.��Example:���� Spreadsheet, organized by row/column, w/ column labels
Data Science Tools
Unstructured data refers to all other data (not organized, or not categorized, or not in a well-defined format).
Example:
Screenshot of Data Science Discovery webpage
Retrieved: August 24, 2019
http://courses.las.illinois.edu/fall2019/stat107/
Data Science Tools
Structured Data or Unstructured Data?
Spreadsheet of UIUC student demographics.
Data Science Tools
Structured Data or Unstructured Data?
Screenshot of Data Science Discovery webpage
Retrieved: August 24, 2019
http://courses.las.illinois.edu/fall2019/stat107/
Data Science Tools
Structured Data or Unstructured Data?
“University pilots new data science course”
By Sarah O’Beirne, Daily Illini
https://dailyillini.com/features/2018/11/28/university-pilots-new-data-science-course/
Data Science Tools
Structured Data or Unstructured Data?
�"Walkin' through a quad, the campus is a glow�Kaleidoscope of loud heartbeats under coats�Everybody here wanted somethin' more�Searchin' for a class we hadn't heard before"
Data Science Tools
Structured Data or Unstructured Data?
�Illini Football Scores last season
Data Science Tools
Structured Data or Unstructured Data?
�The return addresses on all your mail you have received.