1 of 21

Data

Maths Learning Centre

University of Adelaide

Semester 1 2023

APPROACHES TO CULTURE

Please have a look at the cards on the table �and talk with others about what you notice.

2 of 21

Main points

  • The Maths Learning Centre are friendly people to talk with
  • Data helps you see patterns you can’t see yourself, �and helps you unsee patterns you thought you could see
  • You need your understanding of context and story to make sense of data
  • Data terminology to make it easier to talk about data
  • The data terminology “significant” has a very specific meaning which is not the same as important.

3 of 21

Maths Learning Centre

  • Safe place to talk about anything mathematical related to any course.
  • No maths is more or less worthy to discuss
  • 10am to 4pm, Monday to Friday
  • Level 3 Hub Central, or via Zoom
  • www.adelaide.edu.au/mathslearning

4 of 21

ACTIVITY

The cards show information about the 52 Disney theatrical release animated feature films (which have a single story) up to 2022.

What do you notice?

5 of 21

Terminology

  • SUBJECT: �Whatever things you write information down about�eg: Disney films
  • VARIABLE: �A specific type of information you write down�eg: Running time, Australian rating
  • CATEGORICAL VARIABLE: �A variable whose options for each subject are categories�eg: Australian rating, Month
  • NUMERICAL VARIABLE: �A variable whose options for each subject are single numbers�eg: Running time, Number of songs sung by characters

6 of 21

ACTIVITY

Think of some new variables you could write down about these subjects. Decide if they’re categorical or numerical.

7 of 21

ACTIVITY

Organise the cards to investigate the month in which a Disney film was released.

8 of 21

Terminology

  • BAR CHART: A chart that shows how many or what percentage of subjects are in each category.

9 of 21

ACTIVITY

Organise the cards to investigate the running time of Disney films.

10 of 21

Terminology

  • HISTOGRAM: Like a bar chart, but the categories are ranges of possible numbers from a numerical variable.
  • BINWIDTH: How wide the bars are on a histogram.

11 of 21

Terminology

  • DISTRIBUTION: A description of how likely subjects are to have the various possible numbers in a numerical variable.
  • SKEWED: A distribution is skewed when subjects are more spread out at one end compared to the other. It’s skewed positively when the bigger end is more spread out; it’s skewed negatively the smaller end is more spread out.

12 of 21

ACTIVITY

Make two separate histograms of running time, �one for G films and one for PG films.

What can you say about the relationship between rating and running time?

13 of 21

ACTIVITY

14 of 21

Terminology

MEDIAN: The number that has half the subjects before it and half the subjects after it.

MEAN: The number that each subject would have if you found the total and shared it evenly among all the subjects.

STANDARD DEVIATION: A number to show how spread out the subjects are in a numerical distribution. (Approximately the average distance between everything and the mean.)

Median times: PG: 101 mins, G: 79 mins

Mean times: PG: 98.1 mins, G: 80.2 mins

SD times: PG: 9.79 mins, G: 7.71 mins

15 of 21

ACTIVITY

Could the difference we see between G and PG films just be random?

Shuffle your cards well and deal out 17 in one pile and the remaining 35 in a second pile.

Create two separate histograms for running time for your two new groups.

Compare to how it turned out for G and PG films.

16 of 21

Terminology

TEST STATISTIC: A number you calculate from the data to help decide a yes-or-no question about a specific number or relationship. �P-VALUE: A probability you calculate from the test statistic to compare your data to what could have been if a specific yes-or-no answer were true.

SIGNIFICANT: The p-value is low (under 0.05), so there is evidence to suggest a difference or relationship is there (but it’s not the same as important and it can’t show cause).

“There is a significant difference in run time on average between G and PG Disney films (t(26)=6.69, p<0.0001)”

17 of 21

ACTIVITY

Organise your cards to investigate the relationship between year of release and running time.

18 of 21

Terminology

SCATTERPLOT: A graph where each subject is a dot lining up with a number on two different axes.

19 of 21

Terminology

CORRELATION COEFFICIENT: A number to say how close to being a straight-line relationship something is. Closer to 1.00 is a stronger relationship.

SIGNIFICANT: The p-value is low (under 0.05), so there is evidence to suggest a difference or relationship is there (but it’s not the same as important and it can’t show cause).

“There is a significant linear relationship between year of release and running time for Disney films (r = 0.685 , t(50)=6.65, p<0.0001)”

20 of 21

ACTIVITY

Investigate whatever you like with the cards. You might like to choose a couple of variables and arrange the cards to see how they are related.

When you’re happy with your arrangement, have a look at what the other groups have done.

21 of 21

ACTIVITY

Choose something you want to remember.