1 of 12

Chapter 4 - Of conjectures and uncertainty

Badri ADHIKARI

Badri Adhikari

Of conjectures and uncertainty

1

Slide

2 of 12

So far..

  • Some rules of thumb for interpreting data and visualizations
    • “Compared to what/who/where/when”
    • Look for pieces that are missing in the model
    • Increase depth and breadth up to a reasonable point
  • In addition to these, a visualization designer is acquainted with the language of research
    • The methods of science will help us ascertain that we’re not being fooled by our sources
    • We will still be fooled on a regular basis, but we will be more careful

Badri Adhikari

Of conjectures and uncertainty

2

Slide

3 of 12

The scientific stance

  • Science is a stance
    • a way to look at the world, that everybody and anybody, regardless of cultural origins or background, can embrace
  • Science is a set of methods, a body of knowledge, & the means to communicate it
  • Scientific discovery consists of an algorithm, that looks like this (reorder, please):
    • Measure/test: You thoroughly measure the phenomenon, collect data, and test it.
    • Conclusions: Based on the evidence you obtained, you draw conclusions.
    • Theory: You put together a systematic set of interrelated hypothesis to describe, explain, or predict the phenomenon.
    • Hypothesis: You transform your conjecture into a formal and testable proposition (hypothesis)
    • Conjecture: You grow curious about a phenomenon & formulate a plausible conjecture to describe it
  • These steps may open researchers' eyes to new paths to explore
    • so they don’t constitute a process with beginning and end, but a loop
  • The scientific stance will never take us all the way to absolute, immutable truth
    • it only moves us further to the right in the truth continuum

Badri Adhikari

Of conjectures and uncertainty

3

Slide

4 of 12

Conjecture example 1: Athlete’s performance

  • Moving from ‘curiosity’ to ‘conjecture’ is the first critical step
    • if you conjecture does not make sense, your theory will be meaningless
  • A conjecture is a statement that hasn't been rigorously proven yet
    • Conjectures arise when one notices a pattern that holds true for many cases
  • Properties of a good/rational conjecture:
    • It makes sense intuitively in the light of what we know about the world
    • It is testable somehow
    • It is made of ingredients that are naturally and logically connected to each other
      • if you change any of them, the entire conjecture crumbles
  • Say you observe a phenomenon:

  • Which of the following conjecture makes more sense?
    • C1: “Appearing on the cover of Sports Illustrated magazine makes may athletes perform worse than they did before”
    • C2: “Athletes are usually featured on magazine covers when they are at the peak of their careers”

Badri Adhikari

Of conjectures and uncertainty

4

Slide

5 of 12

Conjecture example 2: China tea pot

“I suggest that between the Earth and Mars there is a china teapot revolving about the sun in an elliptical orbit. The teapot is too small to be revealed even by the most powerful telescopes.” - Philosopher Bertrand Russell.

  • He makes it worse by mentioning:
    • “Since my assertion cannot be disapproved, you should not doubt it.”

  • Of course, nobody can disapprove such an assertion
    • i.e., it cannot be tested
  • For a conjecture to be good it also needs to be testable
    • Being testable also implies being falsifiable
  • Sadly, we love to come up with non-testable conjectures
    • and we use them when arguing with others

Badri Adhikari

Of conjectures and uncertainty

5

Slide

6 of 12

Conjecture example 3: Infectious disease in Africa

A sparsely populated region in Africa is being ravaged by an infectious disease. You observe that people become ill mostly after attending religious services on Sunday. Which is a more strongly connected conjecture?

Conjecture 1: You are a local shaman and propose that the origin of the disease is some sort of negative energy that oozes out of the spiritual aura of the priests and permeates the temples where they preach.

How to test? Check if people get the disease when they gather in the temples.

Conjecture 2: The disease may be transmitted in crowded places because the agent that provokes is airborne.

How to test? Check at other crowded places as well.

  • All components of the conjecture should be naturally connected to each other
    • Take away “crowd”, “agent”, “airborne”, and your conjecture falls apart
    • Take away “priests”, and you could replace it with “invisible pixies who fly inside the temples”
  • A good conjecture has all components naturally connect to each other

Badri Adhikari

Of conjectures and uncertainty

6

Slide

7 of 12

Hypothesizing: Twitter usage example

  • A conjecture that is formalized to be tested empirically is a hypothesis
  • Conjecture:
    • “X percent of Twitter usage a day leads a majority of writers to a Y percent decrease in productivity”
  • How to translate this to a hypothesis?
    • Define/explain what “productivity” is
  • Hypothesis:
    • “Each increase of Twitter usage reduces the average number of words that writers are able to write in a day”
  • Independent variable (predictor/explanatory variable)
    • “increase in Twitter usage”
  • Dependent variable (outcome/response variable)
    • “average number of words that writers write in a day”

Badri Adhikari

Of conjectures and uncertainty

7

Slide

8 of 12

Studies: Cross-sectional/observational & longitudinal

  • Observational study or a cross-sectional study takes into account data collected just at a particular point in time.
    • Faster, but the results may not be very conclusive
  • Longitudinal study takes into account data collected for a long time (a year, a decade, etc.)
    • It is more difficult and expensive to follow the same people for a long time
  • Which of the following data collection is observational?

Badri Adhikari

Of conjectures and uncertainty

8

Slide

9 of 12

Sample vs population & confounding vs lurking variable

  • The inference we are trying to draw is: “writers can benefit from using Twitter less”
    • i.e., we are trying to study something about a population (all writers) based on a sample we collected
  • Is the sample representative of the population?

  • A confounding variable is an extraneous variable which we incorporate into our model
    • because we know that it may affect my results
    • Example: “geekiness level of the writers”
  • A lurking variable is an extraneous variable which we don’t include in our analysis
    • simply because its existence is unknown

  • When analyzing data/visualizations, when reading studies, polls, surveys, etc., we should always ask: did the authors rigorously search for lurking variables and transform them into confounding variables?

Badri Adhikari

Of conjectures and uncertainty

9

Slide

10 of 12

Controlled experiments

Whenever it is realistic to do so, you should go beyond observational studies and design controlled experiments. They can help minimize the influence of confounding variables. Characteristics of controlled experiments (please reorder):

  • Observe a large number of subjects
    • subjects should be representative of the population you want to learn about
  • Compare: what happens to the subjects in the control and the experimental group
    • if the differences between experimental group and control groups are noticeable enough, you may conclude that the condition under study may have played some role
  • Divide the subjects into two groups
    • the experimental group and the control group
  • Expose the subjects in the experimental group to some sort of condition
    • the control group may or may not be exposed to any condition

Badri Adhikari

Of conjectures and uncertainty

10

Slide

11 of 12

Sample variation

  • If we pick a random sample of 1,000 people to analyze political opinions in the United States, we cannot be 100% certain that they are perfectly representative of the entire country
    • no matter how thorough we are
    • If we pick a completely different random sample, the results may be slightly different
  • Sample variation: Even if our methods of drawing the random sample is rigorous, there will always be some amount of uncertainty.
  • Because of this “uncertainty” researchers will never provide an exact number, after they have observed a sample of 1000 people
    • instead, they will say the number along with a degree of confidence (usually, 95%) and also an error of +/- percentage points

Dotted lines are the range of error using the older scenarios in which women would have 0.5 children more or less than what’s predicted.

Shaded regions are the uncertainties. The darker shading is the 80 percent confidence bars, and the lighter shading shows the 95 percent confidence bars.

Badri Adhikari

Of conjectures and uncertainty

11

Slide

12 of 12

Summary

  • Scientific discovery consists of an algorithm:
    • Curiosity to conjecture
    • Conjecture to hypothesis (formal and testable)
    • Measure, test, and perform experiments/studies to test the hypothesis
      • Observational and longitudinal
  • We should rigorously search for a lurking variable and transform it to a confounding variable
  • In a controlled experiment, we divide the subjects into two groups (experimental and control)
  • Polling/surveying on a sample (not on whole population) always has uncertainty associated with it

Badri Adhikari

Of conjectures and uncertainty

12

Slide