Lecture 9
Histograms
DATA 8
Summer 2017
Slides created by John DeNero (denero@berkeley.edu), Ani Adhikari (adhikari@berkeley.edu), and Sam Lau (samlau95@berkeley.edu)
Announcements
Bar Charts (Review)
Types of Data
All values in a column should be both the same type and be comparable to each other in some way
“Numerical” Data
Just because the values are numbers, doesn’t mean the variable is numerical
Bar Charts
Compare some quantity across categories
(Demo)
Discussion Question
Top 10 highest grossing movies
How long ago each one was released
Bar Charts of Counts
Distributions:
Bar charts can display the distribution of categorical values
(Demo)
Binning
Binning Numerical Values
Binning is counting the number of numerical values that lie within ranges, called bins.
188, 170, 189, 163, 183, 171, 185, 168, 173, ...
160
165
170
175
180
185
190
The [185,190) bin
Histogram
Chart to display the distribution of numerical values using bins
(Demo)
Attendance
The Density Scale
Histogram Axes
By default, hist uses a scale (normed=True) that ensures the area of the chart sums to 100%
(Demo)
How to Calculate Height
The [20, 40) bin contains 59 out of 200 movies
29.5 percent
Height of bar = --------------------------
20 years
= 1.475 percent per year
Height Measures Density
% in bin
Height = ---------------------
width of bin
(Demo)
Area Measures Percent
Area = % in bin = Height x width of bin
Chart Types
Bar Chart vs. Histogram
Bar Chart
Histogram
Overlaid Graphs
For visually comparing two populations
(Demo)