1 of 46

Welcome to Data Science DISCOVERY!

Karle Flanagan

Wade Fagen-Ulmschneider

August 26, 2019

2 of 46

Introductions

No good party starts without introductions...

3 of 46

Introductions:

Karle Flanagan (kflan)

Senior Instructor of Statistics

College of Liberal Arts and Sciences�kmeanskarle

Wade Fagen-Ulmschneider (waf)

Teaching Associate Prof. of Computer Science

Grainger College of Engineering

profwade

4 of 46

Introductions: You!

5 of 46

“Hello” Survey

6 of 46

What to expect this semester?

7 of 46

Inferential Thinking (Statistics)

8 of 46

Inferential Thinking (Statistics)

Computation

9 of 46

Inferential Thinking (Statistics)

Computation

Business�Intelligence

10 of 46

Inferential Thinking (Statistics)

Computation

Business Intelligence

Data Science

11 of 46

What data is most important to many college students?

12 of 46

13 of 46

CRN,Course Subject,Course Number,Course Title,Course Section,Sched Type

41758,AAS,100,Intro Asian American Studies,AD1,DIS

47100,AAS,100,Intro Asian American Studies,AD2,DIS

47102,AAS,100,Intro Asian American Studies,AD3,DIS

51248,AAS,100,Intro Asian American Studies,AD4,DIS

51249,AAS,100,Intro Asian American Studies,AD5,DIS

14 of 46

CRN,Course Subject,Course Number,Course Title,Course Section,Sched Type

41758,AAS,100,Intro Asian American Studies,AD1,DIS

47100,AAS,100,Intro Asian American Studies,AD2,DIS

47102,AAS,100,Intro Asian American Studies,AD3,DIS

51248,AAS,100,Intro Asian American Studies,AD4,DIS

51249,AAS,100,Intro Asian American Studies,AD5,DIS

Data Science Visualization Created at Illinois:� Grade disparity between sections at UIUC

15 of 46

Data Science Tool: Jupyter

Interactive programming tool, optimized for programming data science-type questions.

Main benefit “jupyter notebooks” that allow us to program and document our process in the same interactive file:

16 of 46

Data Science Tool: Pandas

Python Data Analysis Library:

At its core, pandas provide Excel-like access to datasets. Can quickly become much more power than Excel, particularly for large datasets.

17 of 46

Data Science Knowledge: Statistics

How can we know if A is better than B?

How can we compare A with B?

How do we do this under uncertainty?

18 of 46

Course Resources

19 of 46

Course Resources

1. Course Website: http://courses.las.illinois.edu/fall2019/stat107/

  • Google Search: “STAT 107 uiuc”
  • Short URL: http://go.illinois.edu/stat107
  • Contains the syllabus, lecture notes, schedule, etc.

20 of 46

Course Resources

2. Piazza: http://piazza.com/illinois/fall2019/stat107

  • Q & A Forum monitored by course staff
  • Ask questions here!
  • You can answer your peers questions if you know the answer
  • Join our Stat 107 Piazza ASAP!

21 of 46

Course Resources

3. Compass 2g: https://compass2g.illinois.edu

  • Grades will be posted weekly on Compass
  • Any emails that we send will also be posted on Compass
  • iClicker registration is located on Compass

22 of 46

Important Things to Know: Course Staff

Teaching Assistants (TA)

  • STAT 107 TAs are among the best of the graduate students studying Computer Science and Statistics.
  • You will meet your TA in your lab section this week!
  • Your TA will lead your lab section.
  • Your TA will be your first point of contact if you have any questions about the course -- get to know them, they should know you by name!

23 of 46

Important Things to Know: Course Staff

Course Aides (CAs):

  • CAs are undergraduates who took piloted this course last semester, loved it, and are here to help you succeed this semesters!
  • You will see CAs helping in lab sections, office hours, and on Piazza.
  • CAs are from majors all across campus!

There are a lot of people here to help you succeed in Data Science!! If you get stuck, reach out and learn something! :)

24 of 46

Important Things to Know: Office Hours

Professor Office Hours:

  • Every Wednesday, 8:30am - 10:00am in 2215 Siebel Center (Prof. Wade’s office), starts this Wednesday
  • At least one of the professors will always be there, usually both of us!

Open Office Hours:

  • Open office hours weekly on Monday, Wednesday, Thursday, and Friday in 23 Illini Hall (basement computer lab)
  • Open Office Hours start in Week #2 (schedule TBA)

25 of 46

Important Things to Know: Lecture

  1. Lecture Notes:
    • We will give you a handout (like the one you got today!) everyday when you arrive at lecture. Taking notes is key to succeeding in this course.
    • We’ll also post PDFs of the lecture notes online before each class.

2. Lecture Format

3. Lecture Seating

4. Lecture Behavior (is attendance mandatory?)

26 of 46

Important Things to Know: Lab

  • Labs are weekly experiences where you will do Data Science in groups, with the help of TAs and CAs!
    • Labs start this week!
  • Laptop:
    • Sections AYD and AYL: Computer Lab, no laptop needed
    • All Other Sections: BYOD, laptop computer required
    • If you do not have a laptop, make sure to be in AYD or AYL!
  • 15 points /week: 10 points for the lab, 5 points for attendance

27 of 46

Important Things to Know: Grading

Course grades are given in points, there are a total of 1,000 points throughout the semester:

Labs (180 points)� + Homework (275 points)� + Project (145 points)� + Exams (400 points)� = 1,000 points

28 of 46

Important Things to Know: Grading

  • Earning 930 points guarantees you an A in the class (4.0 GPA)
  • Earning 900 points guarantees you an A- in the class
  • Earning 870 points guarantees you a B+ in the class
  • Earning 830 points guarantees you a B in the class
  • Earning 800 points guarantees you a B- in the class
  • ...and so forth...

29 of 46

Important Things to Know: Grading

Course grades are given in points, there are a total of 1,000 points throughout the semester:

Labs (180 points)� + Homework (275 points)� + Project (145 points)� + Exams (400 points)� = 1,000 points

30 of 46

Important Things to Know: Grading

Course grades are given in points, there are a total of 1,000 points throughout the semester:

Labs (180 points)� + Homework (275 points)� + Project (145 points)� + Exams (400 points)� = 1,000 points + 107 Extra Credit Points (max)� = 1,107 possible points!!!

31 of 46

Important Things to Know: Extra Credit

You can earn +1 extra credit point every lecture:

  1. iClicker Points:

Details: We will ask iClicker questions throughout each class starting this Wednesday. You will get +1 if you answer all of the questions that day. Register your iClicker on Compass 2g ASAP!

32 of 46

Important Things to Know: Extra Credit

You can earn +1 extra credit point after every lecture:

2. Extra Credit (EC) Python Notebooks:

Details: After each lecture, there will be a short Jupyter notebook available to practice and apply the programming techniques that you learned in class for +1. Each notebook is due at the start of the next lecture (12:00noon).

33 of 46

Important Things to Know: Extra Credit

You can earn +1 extra credit point most weeks:

3. Course-wide Surveys:

Details: Throughout the semester, we will have extra credit surveys available for you to complete! We will analyze this data together in class :)

34 of 46

Important Things to Know: Grading

See the course website for details on the following topics:

  • Do we accept late work?
  • Do we round grades?
  • How many points give you a certain letter grade
  • ...etc.

35 of 46

Important Things to Know: Exams

All exams (including the final exam) will take place in the Computer Based Testing Facility (CBTF):

  • You choose and sign up for when you want to take your exam, within a 3-day window!
  • The exam is computer-based, but the computer doesn’t have Internet.�
  • Exam Schedule:
    • Exam 1 (50 min): Monday, Sept. 30 - Wednesday, Oct. 2
    • Exam 2 (50 min): Monday, Nov. 11 - Wednesday, Nov. 13
    • Final Exam: During Exam Week

36 of 46

Data Science Tools

37 of 46

Data Science Tools

There are two broad categories of data:�

Structured Data�andUnstructured Data

38 of 46

Data Science Tools

Structured data refers to data that has been organized and categorized in a well-defined format.

39 of 46

Data Science Tools

Structured data refers to data that has been organized and categorized in a well-defined format, or can easily be organized in such a way.�Example:���� Spreadsheet, organized by row/column, w/ column labels

40 of 46

Data Science Tools

Unstructured data refers to all other data (not organized, or not categorized, or not in a well-defined format).

Example:

Screenshot of Data Science Discovery webpage

Retrieved: August 24, 2019

http://courses.las.illinois.edu/fall2019/stat107/

41 of 46

Data Science Tools

Structured Data or Unstructured Data?

Spreadsheet of UIUC student demographics.

42 of 46

Data Science Tools

Structured Data or Unstructured Data?

Screenshot of Data Science Discovery webpage

Retrieved: August 24, 2019

http://courses.las.illinois.edu/fall2019/stat107/

43 of 46

Data Science Tools

Structured Data or Unstructured Data?

“University pilots new data science course”

By Sarah O’Beirne, Daily Illini

https://dailyillini.com/features/2018/11/28/university-pilots-new-data-science-course/

44 of 46

Data Science Tools

Structured Data or Unstructured Data?

�"Walkin' through a quad, the campus is a glow�Kaleidoscope of loud heartbeats under coats�Everybody here wanted somethin' more�Searchin' for a class we hadn't heard before"

45 of 46

Data Science Tools

Structured Data or Unstructured Data?

�Illini Football Scores last season

46 of 46

Data Science Tools

Structured Data or Unstructured Data?

�The return addresses on all your mail you have received.