1 of 28

Data Analysis

with Python

Full tutorial for beginners

2 of 28

Hands-on, online Data Science training.

3 of 28

4 of 28

About this tutorial

  1. What is Data Analysis
  2. Real example Data Analysis with Python
  3. How to use Jupyter Notebooks
  4. Intro to NumPy (exercises included)
  5. Intro to Pandas (exercises included)
  6. Data Cleaning
  7. Reading Data SQL, CSVs, APIs, etc
  8. Python in Under 10 Minutes

5 of 28

What is Data Analysis?

6 of 28

What is Data Analysis

> A process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusion and supporting decision-making.

Definition by Wikipedia.

7 of 28

What is Data Analysis

> A process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusion and supporting decision-making.

Definition by Wikipedia.

8 of 28

What is Data Analysis

> A process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusion and supporting decision-making.

Definition by Wikipedia.

9 of 28

What is Data Analysis

> A process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusion and supporting decision-making.

Definition by Wikipedia.

10 of 28

What is Data Analysis

> A process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusion and supporting decision-making.

Definition by Wikipedia.

11 of 28

Data Analysis Tools

12 of 28

Auto-managed closed tools

Programming Languages

13 of 28

👎 Closed Source 🙅‍♂️

👎 Expensive 💸

👎 Limited 😩

👍 Easy to learn 👩‍💻

👍 Open Source 🤩

👍 Free (or very cheap) 🤑

👎 Extremely Powerful 💪

👎 Steep learning curve 👩‍💻

Auto-managed closed tools

Programming Languages

14 of 28

Why Python

for Data Analysis?

15 of 28

Why Python for Data Analysis?

Why would we choose Python over R or Julia?

👍 very simple and intuitive to learn

👍 “correct” language

👍 powerful libraries (not just for Data Analysis)

👍 free and open source

👍 amazing community, docs and conferences

16 of 28

When to choose R?

Python, sadly, is not always the answer

  • When R Studio is needed
  • When dealing with advanced statistical methods
  • When extreme performance is needed

17 of 28

The Data Analysis

Process

18 of 28

  • Building Machine Learning Models
  • Feature Engineering
  • Moving ML into production
  • Building ETL pipelines
  • Live dashboard and reporting
  • Decision making and real-life tests
  • Exploration
  • Building statistical models
  • Visualization and representations
  • Correlation vs Causation analysis
  • Hypothesis testing
  • Statistical analysis
  • Reporting
  • Hierarchical Data
  • Handling categorical data
  • Reshaping and transforming structures
  • Indexing data for quick access
  • Merging, combining and joining data
  • Missing values and empty data
  • Data imputation
  • Incorrect types
  • Incorrect or invalid values
  • Outliers and non relevant data
  • Statistical sanitization

Data Extraction

Data Cleaning

Data Wrangling

Analysis

  • SQL
  • Scrapping
  • File Formats
    • CSV
    • JSON
    • XML
  • Consulting APIs
  • Buying Data
  • Distributed Databases

Action

19 of 28

Data Analysis

Vs

Data Science

20 of 28

DATA ANALYSIS VS DATA SCIENCE

The traditional view

21 of 28

Python & PyData Ecosystem

22 of 28

PYTHON ECOSYSTEM:

The libraries we use...

  • pandas: The cornerstone of our Data Analysis job with Python
  • matplotlib: The foundational library for visualizations. Other libraries we’ll use will be built on top of matplotlib.
  • numpy: The numeric library that serves as the foundation of all calculations in Python.
  • seaborn: A statistical visualization tool built on top of matplotlib.
  • statsmodels: A library with many advanced statistical functions.
  • scipy: Advanced scientific computing, including functions for optimization, linear algebra, image processing and much more.
  • scikit-learn: The most popular machine learning library for Python (not deep learning)

23 of 28

How Python Data

Analysts Think

24 of 28

EXCEL, TABLEAU, ETC.

They’re all visual tools...

25 of 28

Thinking like a

Python Data Analyst

26 of 28

And finally,

why Python?

27 of 28

28 of 28

About this tutorial

  • What is Data Analysis
  • Real Example Data Analysis with Python
  • How to use Jupyter Notebooks
  • Intro to NumPy (exercises included)
  • Intro to Pandas (exercises included)
  • Data Cleaning
  • Reading Data SQL, CSVs, APIs, etc
  • Python in Under 10 Minutes