1 of 105

Week 3

Essential Data Visualization

2 of 105

Agenda

  • Data Visualisation - The good and the bad.
  • The essential data visualizations
  • Stylizing your plots

3 of 105

4 of 105

5 of 105

Good Data Visualisations

6 of 105

7 of 105

8 of 105

9 of 105

10 of 105

Austria Solar Energy Report

11 of 105

HOT OR NOT

12 of 105

13 of 105

14 of 105

15 of 105

16 of 105

17 of 105

18 of 105

19 of 105

20 of 105

21 of 105

22 of 105

23 of 105

Setup Google Collab

24 of 105

25 of 105

26 of 105

27 of 105

28 of 105

Relevant columns: Date

  • Date - The date of the observation
    • First: 2015-01-04
    • Last: 2018-03-25

29 of 105

Relevant columns: AveragePrice

  • AveragePrice - the average price of a single avocado
    • Median: 1.37
    • Mean: 1.40
    • std: 0.40
    • min: 0.44
    • max: 3.2

30 of 105

Relevant columns: Type

  • Type - conventional or organic
    • Equal distribution of both types

31 of 105

QUIZ: What type of data is column ‘type’

32 of 105

Relevant columns: region

  • Region - the city or region of the observation
    • 54 unique values

33 of 105

Relevant columns: Total Volume

  • Total Volume - Total number of avocados sold
    • mean: 850644.0
    • median: 107376.76
    • std: 3453545.0
    • min: 84.56
    • max : 2505650.0

34 of 105

Relevant columns: 4225

  • 4225 - Total number of avocados with PLU 4225 sold

35 of 105

Relevant columns: 4770

  • 4770 - Total number of avocados with PLU 4770 sold

36 of 105

Relevant columns: 4046

  • 4046 - Total number of avocados with PLU 4046 sold

37 of 105

38 of 105

Plotly Express

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on "tidy" data and produces easy-to-style figures.

39 of 105

Remember this?

40 of 105

Histogram

A histogram is a graphical display of data using bars of different heights. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range. A histogram displays the shape and spread of continuous sample data.

41 of 105

Histogram of Conventional Avocado Prices

42 of 105

DEMO

43 of 105

What is price range of the highest count? (organic)

1.58-1.59

44 of 105

Comparing two histograms

45 of 105

DEMO

46 of 105

Box Plot

A box plot displays the five-number summary of data set. The five-number summary is the minimum, first quartile, median, third quartile, and maximum.

47 of 105

48 of 105

Box Plot of Conventional Avocado Prices

49 of 105

DEMO

50 of 105

Quiz Plot box plot for organic avocado

What is value of upper fence?

Answer: 2.54

51 of 105

Box Plot for each type of Avocado

52 of 105

DEMO

53 of 105

Line Plot

A line chart or line plot or line graph or curve chart is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments.

54 of 105

Average Price of Conventional Avocado Over Time

55 of 105

DEMO

56 of 105

Quiz: make organic plot. What is the date of the highest peak.

27. Aug 2017

57 of 105

Average Price of Avocado Over Time

58 of 105

DEMO

59 of 105

Bar plot

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.

60 of 105

Bar Plot vs Histogram

61 of 105

Average Price of Conventional Avocado in 2017

62 of 105

DEMO

63 of 105

Quiz df_2017. How many rows?

Answer: 5722

64 of 105

Groupby Exercice. **How many rows and columns does "df_2017_region_price" have?**

108,3

65 of 105

66 of 105

DEMO

67 of 105

Average Price of Avocado in 2017

68 of 105

DEMO

69 of 105

Scatter Plot

A scatter plot uses coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

70 of 105

Correlation between Average Price and Total Volume

71 of 105

DEMO

72 of 105

Correlation in Houston and San Francisco

73 of 105

DEMO

74 of 105

75 of 105

Simple text

simple text can be a great way to communicate

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

76 of 105

Simple text

simple text can be a great way to communicate

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

77 of 105

Never use 3D

78 of 105

Never use 3D

79 of 105

Repeat after me

80 of 105

Repeat after me

I will never use 3D Plots to impress

my managers.

81 of 105

82 of 105

Exception: 3d Scatter Plots

83 of 105

Bar plot as alternative to pie charts.

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

84 of 105

The secret of good plots:

85 of 105

The secret of good plots:

Remove the clutter.

86 of 105

If you don’t remember anything

Remember just this.

87 of 105

88 of 105

89 of 105

Step 1

Remove what has no purpose

90 of 105

Step 1

Remove what has no purpose

91 of 105

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

92 of 105

Gridlines?

Do they help the viewer? �No Gridlines = Better Contrast

93 of 105

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

94 of 105

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

95 of 105

Step 2

Clean your axis.

96 of 105

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

97 of 105

Step 3

The more Obvious

The Better.

98 of 105

Step 3

The more Obvious

The Better.

99 of 105

3483075861872364872136478632875632187640732160873016472613587687215982374872134986213075732984798321503281479832174902579821734987321498278749832174903217

100 of 105

How many 0s?

101 of 105

3483075861872364872136478632875632187640732160873016472613587687215982374872134986213075732984798321503281479832174902579821734987321498278749832174903217

102 of 105

103 of 105

Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, New Jersey: John Wiley & Sons, Inc.

104 of 105

Book Recommendation

105 of 105

And no, dont listen the audiobook.