1 of 6

Make Learning Data Science fun

A reasonable timeline for learning Data Science

2 of 6

Month 1-2

Familiarize yourself with programming R, Python, SQL (Nothing deep, just basic)

Read articles about the job markets

Take a course on Exploratory Data Analysis and Data Validation (Excel)

Learn descriptive statistics - measure of dispersion, measure of central tendency

Play around with some visualizations discovering fun insights (Tableau, Excel, PowerBI)

3 of 6

Month 3 – 6 (this is ongoing)

To Do

Python

R

Download an IDE for your preferred programming language.

Spyder, VS code, Jupyter

Rstudio, Jupyter (I prefer doing all my analysis in Jupyter even when using R)

Understand the basics of different data types (string, integers, floats) and data structures

N/A

Get familiar with “Data” libraries.

Numpy, Pandas, Matplotlib, Statsmodel

Dplyr, Tidyverse, Ggplot2, Caret

Build basic visualization

Matplotlib, Plotly (my absolute favorite), Bokeh

Ggplot2, Plotly, Bokeh

Manipulate and wrangle data with SQL/Pandas/R

Pandas

Tidyverse

Dig deeper into statistical concepts – Correlation analysis, Hypothesis Testing, Distribution types, Linear Regression and learn how these techniques fit into the data analysis ecosystem

Statsmodel, Scipy

Car, ggpubr (truth be told, I slightly prefer R for statistical analysis)

4 of 6

Month 6 - 9

  • Supervised vs Unsupervised learning
  • Classification vs Regression
  • Overfitting, Underfitting and everything in between
  • Simple Classification models – KNN, Logistic Regression, Decision Trees etc
  • Simple Regression Techniques – Linear Regression, Polynomial Regression, Exponential Regression, other General Linear Models
  • Create a project on Classification and or Regression
  • Learn the concepts of version control – git

5 of 6

Month 9 - 18

    • Hypothesis Testing
    • Applications of the Hypothesis Tests (A/B test)

More Statistical concepts

    • Tree – based models
      • Ensemble models
      • Boosted Trees
    • Support Vector Machines

More Supervised Learning

    • Clustering algorithms
      • K-means Clustering
      • Mean-Shift Clustering Algorithm
      • DBSCAN – Density-Based Spatial Clustering of Applications with Noise
      • Expectation-Maximization (EM) Clustering using Gaussian Mixture Models (GMM)
      • Hierarchical Clustering.
    • Apriori

Unsupervised Models

6 of 6

Month 18-24

And their applications (not limited to):

Text Mining

Image Classification

High level Familiarity of:

Autoencoders

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs)