Visualization
NAME – PRIYANKA LOKHANDE.
.
DATA VISUALIZATION
Data visualization is the concept of representing the information and data in the form of Visual
charts, Diagrams and Pictures. It means translating the complex, high amount numerical data
In the form of diagrams to understand it in easier way.
ggplot2 in R Programming Language also termed as Grammar of Graphics is a free, open-source,
and easy-to-use visualization package widely used in R programming language.
It is the most powerful visualization package.
In R, Data visualization tools provide an accessible way to see and understand trends, patterns in data, and
outliers. Data visualization tools and technologies are essential to analyzing massive amounts of
information and making absolute decisions. The concept of using pictures is to understand data that has
been used for centuries.
General ways of data visualization are Charts, Tables, Graphs, Maps, and Dashboards.
General types of data visualization are:
Bar Plot
Histogram
Box Plot
Scatter Plot
Heat Map
Map Visualization
3D Graphics
Bar Plot
There are 2 types of bar plots-
Vertically
Horizontally
This plots are proportional to the value of data items. They are generally used for continuous and categorical
variable plotting.
By setting the horiz parameter to true or false, we can get horizontal and vertical bar plot.
Ozone | Solar R. | Wind | Temp | Month | Day |
41 | 190 | 7.4 | 67 | 5 | 1 |
36 | 118 | 8.0 | 72 | 5 | 2 |
12 | 149 | 12.6 | 74 | 5 | 3 |
18 | 313 | 11.5 | 62 | 5 | 4 |
NA | NA | 14.3 | 56 | 5 | 5 |
28 | NA | 14.9 | 66 | 5 | 6 |
Consider the following air-quality data set for the visualization of data.
# Horizontal Bar Plot for
# Ozone concentration in air
barplot(airquality$Ozone,
main = 'Ozone Concenteration in air',
xlab = 'ozone levels', horiz = TRUE)
# Vertical Bar Plot for
# Ozone concentration in air
barplot(airquality$Ozone,
main = 'Ozone Concenteration in air',
xlab = 'ozone levels', col ='blue', horiz = FALSE)
Output:
Output:
Histogram
A histogram is like a bar chart as it uses bars of varying height to represent data distribution.
However, in a histogram values are grouped into consecutive intervals called bins.
In a Histogram, continuous values are grouped and displayed in these bins whose size can be varied.
# Histogram for Maximum Daily Temperature
data(airquality)
hist(airquality$Temp, main ="La Guardia Airport's\
Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)
Output:
Box Plot
The statistical summary of the given data is presented graphically using a boxplot.
A boxplot shows information like the minimum and maximum data point, the median value,
first and third quartile, and interquartile range.
# Box plot for average wind speed
data(airquality)
boxplot(airquality$Wind, main = "Average wind speed\
at La Guardia Airport",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "brown",
horizontal = TRUE, notch = TRUE)
Output:
Multiple box plots can also be generated at once through the following code:
# Multiple Box plots, each representing
# an Air Quality Parameter
boxplot(airquality[, 0:4],
main ='Box Plots for Air Quality Parameters')
Output:
Scatter Plot
A scatter plot is composed of many points on a Cartesian plane. Each point denotes the value taken by two parameters and helps us easily identify the relationship between them.
# Scatter plot for Ozone Concentration per month
data(airquality)
plot(airquality$Ozone, airquality$Month,
main ="Scatterplot Example",
xlab ="Ozone Concentration in parts per billion",
ylab =" Month of observation ", pch = 19)
Output:
Heatmap is defined as a graphical representation of data using colors to visualize the value of the matrix. heatmap() function is used to plot heatmap.
Heat Map
# Set seed for reproducibility
# set.seed(110)
# Create example data
data <- matrix(rnorm(50, 0, 5), nrow = 5, ncol = 5)
# Column names
colnames(data) <- paste0("col", 1:5)
rownames(data) <- paste0("row", 1:5)
# Draw a heatmap
heatmap(data)
Output:
Map Visualization
By using maps package to visualize and display geographical maps using an R programming language.
install.packages("maps")
# Read dataset and convert it into
# Dataframe
data <- read.csv("worldcities.csv")
df <- data.frame(data)
# Load the required libraries
library(maps)
map(database = "world")
# marking points on map
points(x = df$lat[1:500], y = df$lng[1:500], col = "Red")
Output:
3D Graphs
By using preps() function, This function is used to create 3D surfaces in perspective view. This function will draw perspective plots of a surface over the x–y plane.
Syntax: persp(x, y, z)
Parameter: This function accepts different parameters i.e. x, y and z where x and y are vectors defining the location along x- and y-axis. z-axis will be the height of the surface in the matrix z.
Return Value: persp() returns the viewing transformation matrix for projecting 3D coordinates (x, y, z) into the 2D plane using homogeneous 4D coordinates (x, y, z, t).
# Adding Titles and Labeling Axes to Plot
cone <- function(x, y){
sqrt(x ^ 2 + y ^ 2)
}
# prepare variables.
x <- y <- seq(-1, 1, length = 30)
z <- outer(x, y, cone)
# plot the 3D surface
# Adding Titles and Labeling Axes to Plot
persp(x, y, z,
main="Perspective Plot of a Cone",
zlab = "Height",
theta = 30, phi = 15,
col = "orange", shade = 0.4)
Advantages of Data Visualization in R:
R has the following advantages over other tools for data visualization:
Disadvantages of Data Visualization in R:
R also has the following disadvantages:
Application Areas:
�