1 of 16

Visualization

NAME – PRIYANKA LOKHANDE.

.

2 of 16

DATA VISUALIZATION

Data visualization is the concept of representing the information and data in the form of Visual

charts, Diagrams and Pictures. It means translating the complex, high amount numerical data

In the form of diagrams to understand it in easier way.

ggplot2 in R Programming Language also termed as Grammar of Graphics is a free, open-source,

and easy-to-use visualization package widely used in R programming language.

It is the most powerful visualization package.

3 of 16

In R, Data visualization tools provide an accessible way to see and understand trends, patterns in data, and

outliers. Data visualization tools and technologies are essential to analyzing massive amounts of

information and making absolute decisions. The concept of using pictures is to understand data that has

been used for centuries.

General ways of data visualization are Charts, Tables, Graphs, Maps, and Dashboards.

General types of data visualization are:

Bar Plot

Histogram

Box Plot

Scatter Plot

Heat Map

Map Visualization

3D Graphics

4 of 16

Bar Plot

There are 2 types of bar plots-

Vertically

Horizontally

This plots are proportional to the value of data items. They are generally used for continuous and categorical

variable plotting.

By setting the horiz parameter to true or false, we can get horizontal and vertical bar plot.

5 of 16

Ozone

Solar R.

Wind

Temp

Month

Day

41

190

7.4

67

5

1

36

118

8.0

72

5

2

12

149

12.6

74

5

3

18

313

11.5

62

5

4

NA

NA

14.3

56

5

5

28

NA

14.9

66

5

6

Consider the following air-quality data set for the visualization of data.

6 of 16

# Horizontal Bar Plot for

# Ozone concentration in air

barplot(airquality$Ozone,

main = 'Ozone Concenteration in air',

xlab = 'ozone levels', horiz = TRUE)

# Vertical Bar Plot for

# Ozone concentration in air

barplot(airquality$Ozone,

main = 'Ozone Concenteration in air',

xlab = 'ozone levels', col ='blue', horiz = FALSE)

Output:

Output:

7 of 16

Histogram

A histogram is like a bar chart as it uses bars of varying height to represent data distribution.

However, in a histogram values are grouped into consecutive intervals called bins.

In a Histogram, continuous values are grouped and displayed in these bins whose size can be varied.

# Histogram for Maximum Daily Temperature

data(airquality)

hist(airquality$Temp, main ="La Guardia Airport's\

Maximum Temperature(Daily)",

xlab ="Temperature(Fahrenheit)",

xlim = c(50, 125), col ="yellow",

freq = TRUE)

Output:

8 of 16

Box Plot

The statistical summary of the given data is presented graphically using a boxplot.

A boxplot shows information like the minimum and maximum data point, the median value,

first and third quartile, and interquartile range.

# Box plot for average wind speed

data(airquality)

boxplot(airquality$Wind, main = "Average wind speed\

at La Guardia Airport",

xlab = "Miles per hour", ylab = "Wind",

col = "orange", border = "brown",

horizontal = TRUE, notch = TRUE)

Output:

9 of 16

Multiple box plots can also be generated at once through the following code:

# Multiple Box plots, each representing

# an Air Quality Parameter

boxplot(airquality[, 0:4],

main ='Box Plots for Air Quality Parameters')

Output:

10 of 16

Scatter Plot

A scatter plot is composed of many points on a Cartesian plane. Each point denotes the value taken by two parameters and helps us easily identify the relationship between them.

# Scatter plot for Ozone Concentration per month

data(airquality)

plot(airquality$Ozone, airquality$Month,

main ="Scatterplot Example",

xlab ="Ozone Concentration in parts per billion",

ylab =" Month of observation ", pch = 19)

Output:

11 of 16

Heatmap is defined as a graphical representation of data using colors to visualize the value of the matrix. heatmap() function is used to plot heatmap.

Heat Map

# Set seed for reproducibility

# set.seed(110)

# Create example data

data <- matrix(rnorm(50, 0, 5), nrow = 5, ncol = 5)

# Column names

colnames(data) <- paste0("col", 1:5)

rownames(data) <- paste0("row", 1:5)

# Draw a heatmap

heatmap(data)

Output:

12 of 16

Map Visualization

By using maps package to visualize and display geographical maps using an R programming language.

install.packages("maps")

# Read dataset and convert it into

# Dataframe

data <- read.csv("worldcities.csv")

df <- data.frame(data)

# Load the required libraries

library(maps)

map(database = "world")

# marking points on map

points(x = df$lat[1:500], y = df$lng[1:500], col = "Red")

Output:

13 of 16

3D Graphs

By using preps() function, This function is used to create 3D surfaces in perspective view. This function will draw perspective plots of a surface over the x–y plane.

Syntax: persp(x, y, z)

Parameter: This function accepts different parameters i.e. x, y and z where x and y are vectors defining the location along x- and y-axis. z-axis will be the height of the surface in the matrix z.

Return Value: persp() returns the viewing transformation matrix for projecting 3D coordinates (x, y, z) into the 2D plane using homogeneous 4D coordinates (x, y, z, t).

14 of 16

# Adding Titles and Labeling Axes to Plot

cone <- function(x, y){

sqrt(x ^ 2 + y ^ 2)

}

# prepare variables.

x <- y <- seq(-1, 1, length = 30)

z <- outer(x, y, cone)

# plot the 3D surface

# Adding Titles and Labeling Axes to Plot

persp(x, y, z,

main="Perspective Plot of a Cone",

zlab = "Height",

theta = 30, phi = 15,

col = "orange", shade = 0.4)

15 of 16

Advantages of Data Visualization in R: 

R has the following advantages over other tools for data visualization: 

  • R offers a broad collection of visualization libraries along with extensive online guidance on their usage.
  • R also offers data visualization in the form of 3D models and multipanel charts.
  • Through R, we can easily customize our data visualization by changing axes, fonts, legends, annotations, and labels.

Disadvantages of Data Visualization in R:

R also has the following disadvantages: 

  • R is only preferred for data visualization when done on an individual standalone server.
  • Data visualization using R is slow for large amounts of data as compared to other counterparts.

16 of 16

Application Areas: 

  • Presenting analytical conclusions of the data to the non-analysts departments of your company.
  • Health monitoring devices use data visualization to track any anomaly in blood pressure, cholesterol and others.
  • To discover repeating patterns and trends in consumer and marketing data.
  • Meteorologists use data visualization for assessing prevalent weather changes throughout the world.
  • Real-time maps and geo-positioning systems use visualization for traffic monitoring and estimating travel time.