Big Data Modeling with R
About Me
Elijah Appiah from Ghana.
Ph.D. Economics at NIDA in Bangkok, Thailand.
Economist by profession and Data Scientist by passion.
Enthusiastic about working with data daily.
Technical skills in LATEX, Microsoft Office (Word, Excel, PowerPoint), SPSS, Stata, EViews, Python, R, MATLAB, Power BI, Tableau, and Google TensorFlow.
IDE and Packages for this Lecture
benchmarkme |
tidyverse |
data.table |
sparklyr |
tidymodels |
GGally |
ggExtra |
What is Big Data?
[Source: https://www.techtarget.com/whatis/definition/3Vs]
Why is Big Data Important?
Analyzing Big Data - Steps
Analyzing Big Data
Analyzing Big Data
Analyzing Big Data
https://www.datastackhub.com/top-tools/open-source-big-data-tools/
Analyzing Big Data – Suggestions
Now, let’s practice
Analyzing Big Data - Benchmarking
benchmark_std() {benchmarkme}
benchmark_io() {benchmarkme}
Now, let’s practice
Analyzing Big Data – Exploratory Data Analysis (EDA)
Framework for data science:
Grolemund & Wickham (2017)
Analyzing Big Data – Data Wrangling with `data.table` Package
data.table {data.table}
Analyzing Big Data
Source: https://tinyurl.com/y366kvfx
Analyzing Big Data – Data Wrangling with `data.table` Package
Now, let’s practice
Analyzing Big Data – The `sparklyr` Package
Analyzing Big Data – The `sparklyr` Package
Analyzing Big Data – The `sparklyr` Package
https://www.oracle.com/java/technologies/downloads/?er=221886#java8
Analyzing Big Data – The `sparklyr` Package
install.packages(“sparklyr”)
library(sparklyr)
spark_install()
Analyzing Big Data – The `sparklyr` Package
The `dplyr` syntax
Now, let’s practice
Visualizing Big Data – The `ggplot2` Package
Visualizing Big Data – The `ggplot2` Package
{Ggally}, {ggExtras}, {ggalluvial}
Now, let’s practice
Modeling Big Data – Machine Learning (ML)
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions.
Source: Wikipedia
Modeling Big Data – Machine Learning (ML)
Machine Learning Algorithms
Supervised ML
Unsupervised ML
Model Evaluation - Classification
Confusion Matrix for Classification Problems
Model Evaluation - Classification
Confusion Matrix for Classification Problems
Model Evaluation - Classification
Model Evaluation – Regression
Typical Machine Learning Pipeline
Key Issues in ML
Modeling Big Data – Machine Learning (ML)
tidymodels
Big Data Modeling – Project Time
TIME FOR A PROJECT!
Now, let’s practice
THANK
YOU