Lecture 8
Histograms
DATA 8
Spring 2022
Announcements
Weekly Goals
Distributions
Terminology
A Distribution
Source: Pew Research
Each individual is in exactly one category. Percents add up to 100.
Not a Distribution
Percents of survey respondents on “a major reason they would find it difficult to quarantine themselves for at least 14 days”
Source: Pew Research
Each respondent can pick more than one answer.
The bars represent overlapping groups.
Categorical Distributions
(Demo)
Bar Chart
To display all the values of the variable along with all their frequencies
(Demo)
Numerical Distributions
Grouping Numerical Values: Binning
Binning is counting the number of numerical values that lie within ranges, called bins.
188, 170, 189, 163, 183, 171, 185, 168, 173, ...
160
165
170
175
180
185
190
The [185,190) bin
(Demo)
Area Principle
What Is Wrong With This Picture?
Caption: The new iPad battery is 70% bigger than the previous iPad.
Area Principle
Areas should be proportional to the values they represent.
For example
Drawing Histograms
Histogram
(Demo)
Areas will still represent percents.
68%
Density
Histogram Axes
(Demo)
How to Calculate Height
The [40, 65) bin contains 56 out of 200 movies
28 percent
Height of bar = --------------------------
25 years
= 1.12 percent per year
Height Measures Density
% in bin
Height = ---------------------
width of bin
Area Measures Percent
Area of bar = % in bin = Height x width of bin
Discussion Questions
Compare the bins [10, 20) and [20, 40).
Answer: [20, 40), bigger area
Answer: [10, 20), taller
Bar Chart or Histogram?
Bar Chart
Histogram
To display a distribution:
Discussion Questions
What is the height of each bar in this �histogram?
my_bins = make_array(0, 15, 25, 85)
incomes.hist(1, bins = my_bins)
What are the vertical axis units?
Name | 2016 Income (millions) |
Jennifer Lawrence | 61.7 |
Scarlett Johansson | 57.5 |
Angelina Jolie | 40 |
Jennifer Aniston | 24.75 |
Anne Hathaway | 24 |
Melissa McCarthy | 24 |
Bingbing Fan | 20 |
Sandra Bullock | 20 |
Cara Delevingne | 15 |
Reese Witherspoon | 15 |
Amy Adams | 15 |
Kristen Stewart | 12 |
Amanda Seyfried | 10.5 |
Tina Fey | 10.5 |
Julia Roberts | 10 |
Emma Stone | 10 |
Natalie Portman | 8.5 |
Margot Robbie | 8 |
Meryl Streep | 6 |
Mila Kunis | 4.5 |
Answers
Vertical axis units: Percent per million
my_bins = make_array(0,15,25,85)
[0, 15): (45%)/(15 million)
= 3 % per million
[15, 25): (40%)/(10 million)
= 4 % per million
[25, 85): (15%)/(60 million)
= 0.25 % per million
Name | 2016 Income (millions) |
Jennifer Lawrence | 61.7 |
Scarlett Johansson | 57.5 |
Angelina Jolie | 40 |
Jennifer Aniston | 24.75 |
Anne Hathaway | 24 |
Melissa McCarthy | 24 |
Bingbing Fan | 20 |
Sandra Bullock | 20 |
Cara Delevingne | 15 |
Reese Witherspoon | 15 |
Amy Adams | 15 |
Kristen Stewart | 12 |
Amanda Seyfried | 10.5 |
Tina Fey | 10.5 |
Julia Roberts | 10 |
Emma Stone | 10 |
Natalie Portman | 8.5 |
Margot Robbie | 8 |
Meryl Streep | 6 |
Mila Kunis | 4.5 |