But, It’s A Number So It Has To Be True…Right?:

Introduction to Statistical Literacy, Part II

Lynette Hoelter

ICPSR, University of Michigan

Thursday, July 15, 2016, 9 a.m.

(go to http://dataliteracy.si.umich.edu/conference for the schedule, session title, etc.)

This slide is up while participants are entering the room.

Be sure to run the Audio Wizard to check your mic as soon as you enter the room. We do not recommend you use video for this webinar – our philosophy is to keep things as simple as possible.

Some moderators and presenters like to place their phone or a stereo speaker next to their mic to play background music (optional).

Please start the webinar ON TIME (do not wait for latecomers as we would with f2f PD).

Moderator starts recording.

MODERATOR: “Hello and welcome to this session. Todays’s session is called <title> and will be presented by <presenter name> of <institutional affiliation>. My name is <moderator name>, and I will be your moderator today. I’ll be monitoring the chat and keeping a list of questions that arise that I’ll share with the presenter during the Q&A time. [Note: some presenters prefer to answer questions as they arise – that’s also fine. Just work it out with your presenter in advance.]

ADVANCE SLIDE

Sponsors

MODERATOR:

The 4T Data Literacy conference is a satellite conference of the 4T Virtual Conference, an educational technology conference hosted by the University of Michigan Schools of Education and Information.

The 4T Data Literacy conference is a project of the University of Michigan School of Information and the University of Michigan Library. This project is made possible in part by the Institute of Museum and Library Services.

ADVANCE SLIDE

SCECHs

Log in with your full name

Attend at least 3 live sessions and up to 12

Submit your form by July 22!

More information:

http://dataliteracy.si.umich.edu/scech

MODERATOR: If you are registered with the Michigan Department of Education’s online continuing education credits program, you are eligible to apply for SCECH credits. To be eligible, you must:

I’m posting the URL where you can find details about SCECH credits and download the form you need to submit.

<Post this URL: http://dataliteracy.si.umich.edu/scech >

ADVANCE SLIDE

http://dataliteracy.si.umich.edu/conference

#4tvirtualcon

MODERATOR: Within 48 hours of this session, you will be able to find links to the archived recordings of this conference at the URL shown here. If you enjoy this conference, we hope you will tweet about it using the hashtag shown on your screen.

<post URL and hashtag in chat window>

ADVANCE SLIDE

MODERATOR: You are invited to join the other 4T satellite conference, the free 4T Digital Writing conference, in October. I’m posting the URL in the chat window.

<http://4tdwvirtualcon.com>

ADVANCE SLIDE

chat

magic wand

raise hand

polling

MODERATOR: Let’s take a quick look at the tools you can use to participate actively in the webinar. 4T conferences are known for the savvy, engaged conversations not only with the presenter and moderator but with one another.

ADVANCE SLIDE

Use Magic Wand Tool

MODERATOR: Let’s see where everyone is from. If you’re from Michigan, mark your location on the county map in the corner. Everyone else, mark your state. If you’re from another country, please click on the logo.

<wait about 10 seconds, make small talk about where people are from, and as the number of icons appearing tapers off, thank people and say, now I’d like to turn the webinar over to <presenter name>.

<ADVANCE SLIDE>

<MODERATOR TURNS OFF MIC>

<PRESENTER TURNS ON MIC>

Lynette Hoelter
Presenter

MODERATOR INTRODUCES PRESENTER OR PRESENTER INTRODUCES HERSELF

What Do You Teach?

HS Language Arts

HS Social Studies

HS Science

HS Other Subjects

HS Math or CS

Admini-

strator

Higher Ed Faculty

HS Librarian

Higher Ed Librarian

Other

In order to help me get a better sense of who we have in the room, would you tell me a little about yourselves? Let’s begin with what subject(s) you teach – now is the time to practice using the magic wand tool that Angie had you try a few minutes ago.

How comfortable are you with concepts of statistics?

Extremely

comfortable

Extremely

Uncomfortable

No big deal.

Extremely uncomfortable = Running for the door

Extremely comfortable = wake up thinking about ways you might use statistics today

Session Goals

  • Recap of Part I
  • Key statistics concepts:
    • Variables
    • Percentages, rates, etc.
    • Averages
    • Sampling
    • Margin of Error/Confidence
    • Correlation
    • Significance
  • Some sources of data and examples

Here, the presenter starts her presentation. You are free to use the sample templates included here or to use your own. It’d be great if you could use the logo on each slide to help us brand things (people, more and more, now screenshot webinar slides and post to social media, so this helps people know it’s from our conference).

Big things to keep in mind:

Refer to the GDoc for suggestions on webinars: https://docs.google.com/document/d/1DlilwPikxwGXiGAYAQcORe-cXnhcb11NyyNOKBwVTlY/edit?usp=sharing

Recap of Part I

  • Statistical literacy is a mindset, core concepts from statistics help build that
  • Ability to think critically about numbers in the “wild”
  • “Data” can mean a single number, a representation of results, or the information underlying that representation
  • Building students’ SL skills does not have to be onerous undertaking

Key Statistical Concepts

Note that this is part 2 – order presented as in most statistics texts, but importance for QL identified

Yesterday’s concepts: variables, percentages/rates, measures of central tendency. Today: Sampling, margin of error/confidence, correlation, significance.

Key Concepts: Sampling

    • What we know comes from who is “measured”
      • Necessary to know who is in and who is out of sample population
      • What is target population (group to whom generalization is desired)?
    • Size isn’t the only thing that matters
    • Representation of a larger group requires randomness (probability designs)
      • Is sampling frame representative?

Just as we began yesterday by thinking about variables – what was measured, by whom, how, and why – today we’ll start by thinking about WHO was included in the measurement. If the data come from a survey, who was asked (and who answered, not always the same thing) the questions? How were they chosen?

Some people think that large samples mean you can believe or generalize the results (differing opinions on what “large” means), but that isn’t entirely true. How the sample is selected is just as important as the size.

Easiest sample – simple random (pulling names out of a hat), in reality, harder to pull off. Complex sample designs. Sampling frame can detract from generalizability if a certain sub-population is missing from the list and those individuals are different than the ones on the list – e.g., phone polling: lists using only land lines vs those that include cells.

Key Concepts: Sampling

    • Results from convenience, quota, snowball samples cannot be generalized
      • Many polling organizations use very large opt-in internet panels with random selection from within panel
      • Market research tends to be more “okay” with large non-random samples
    • Other methodological issues worth thinking about: mode of data collection, incentives, question wording/presentation

Mode of data collection: web vs phone – how people respond, information that is given to an interviewer vs self-administered

Incentives: large enough to bias sample? Some opt-in panels pay – people sign up with multiple “personas”

Question wording/presentation: Words used, but also other cues. Research on Web surveys with questions about overall health – changing the picture accompanying the text, even though not referenced, changed responses. If opinion questions and option for “I don’t know,” force people to choose even when they haven’t thought about something before.

Key Concepts: Margin of Error

    • “Bounds” around an interval showing where the true value is likely to lie
    • Different levels of certainty = different sizes of intervals
      • Higher certainty, larger interval
      • Lower certainty, smaller interval
    • When comparing two scores and intervals overlap, no real difference between the two scores

Especially common on opinion polls, election data.

Data from Huffpost Pollster

URL: http://elections.huffingtonpost.com/pollster/polls/yougov-economist-24874

For Economist Poll, means “real value” is between 37.8% - 44.2% for Clinton; 34.8% - 41.2% for Trump. “Too close to call”

Key Concepts: Correlation

    • Vocabulary issue: correlation means association
    • Range from 0 to ±1
      • Closer to ±1, stronger relationship
      • Positive: both variables go up or both go down, negative: one up, one down
    • Association is necessary, not sufficient for causation
      • Causation requires: time order, logical reason for relationship, no third variable
    • “Spurious correlation” = third variable causing the noted relationship

Time order highlighted because hard to determine in research other than longitudinal studies which tend to be expensive. Attitudinal research – both things measured at the same time, either could be “causing” the other.

Key Concepts:

http://www.nytimes.com/2014/10/12/upshot/heavier-babies-do-better-in-school.html

Fun Website: www.TylerVigen.com/spurious-correlations. Computer science project uses data mining to find “relationships.” Other examples: consumption of mozzarella cheese and civil engineering doctorates awarded; sociology doctorates awarded and number of non-commercial space flights; Japanese passenger cars sold in US with suicide by crashing of motor vehicle; length of winning word of Scripps Howard spelling bee with number of people killed by venomous spider…

Topics are odd enough that they help reinforce idea that just because two things move in a similar pattern does not necessarily mean they are related, let alone causal. Students will still try to come up with explanations for how/why the two things could be related.

https://hbr.org/2015/06/beware-spurious-correlations

www.tylervigen.com/discover

Fun Website: www.TylerVigen.com/discover allows students to choose variables and create their own graphs.

Everything so far has been positive correlations – examples of negative correlations? Time spent playing video games and average grades; age and amount of hair on head for men; amount of exercise and body weight

Key Concepts:

Another real example of testing what was assumed to be a causal relationship: https://www.washingtonpost.com/news/to-your-health/wp/2015/12/10/happiness-wont-help-you-live-longer-but-unhappiness-wont-kill-you-either/

Key Concepts: Significance

    • Statistical significance: when one can be fairly certain that the result observed (e.g. strength of relationship, size of difference between two groups) did not happen by chance
    • Probability levels typically used:
    • p ≤ .05 = 95% certain
    • p ≤ .01 = 99% certain
    • p ≤ .001 = 99.99% certain
  • Calculation of significance related to sample size – large samples, easier to find significance
  • Statistical significance ~= substantive significance

Another issue of vocabulary… In statistics, “significance” has a very specific meaning.

Easier to get results published when they are “significant” and news outlets often troll academic press releases

Key Concepts: Significance

News story reported in Washington Post: https://www.washingtonpost.com/news/to-your-health/wp/2015/08/11/the-most-depressing-statistic-imaginable-about-being-a-new-parent/?tid=a_inl

Based on article in journal, Demography. Good commentary: https://scatter.wordpress.com/2015/08/13/is-parenthood-really-worse-than-divorce-demographic-clickbait-in-the-washington-post/#more-9288 which includes chart on next slide.

Key Concepts: Significance

Putting it into Practice

Some Websites to begin your search for information to use in class… Many out there, these are just a smattering – always my favorite part of doing presentations like this is finding examples because there is so much out there!

Where to Find Examples: Everywhere!

News Sites and Blogs

  • FiveThirtyEight (Nate Silver)
  • NYT’s The Upshot (including “Data Dive”)
  • Data in the News on TeachingwithData.org
  • Economist’s Graphic Detail
  • Pew Research Center
  • Huffington Post’s Pollster
  • USA Today Interactives

Specifically for Teaching

  • Teaching Quantitative Reasoning with the News (Pedagogy in Action – Carleton College)
  • AskForEvidence.org (includes lesson plans)
  • Association of Religion Data Archives’ Teaching Tools (specific plans for Jr./Sr. High)

These are just a few examples – many more out there!

Interactive Data Sites

  • Census Bureau’s American Fact Finder
  • Social Science Data Analysis Network’s Data Counts!
  • Gapminder.org
  • PewResearch.org Interactives
  • General Social Survey (GSS) Data Explorer
  • Social Explorer
  • StatPlanet

Fun Facts

  • USA Live Stats
  • Worldometers
  • Data360
  • 100 Interesting Data Sets for Statistics

Screenshot from Health section of Worldometers.

Discussion Starters

  • Stories on Facebook, Twitter, etc.
  • Fox news graphics
  • FlowingData.org Mistaken Data posts
  • StatisticsHowTo.com/Misleading Graphs

Example: Percentages add to 120…

Where have you found information to use in class? What works best? What pitfalls have you encountered?

Recap:

Questions?

Lynette Hoelter

lhoelter@umich.edu

(734) 615-5653

4TLD Stat Literacy Session 2.pptx - Google Slides