1 of 19

དཔལ་ལྡན་འབྲུག་གཞུང་། ཤེས་རིག་དང་རིག་རྩལ་གོང་འཕེལ་ལྷན་ཁག།

Department of School Education

Ministry of Education & Skills Development

Online Training for ICT Teachers

28 February 2023

Classes XI & XII ICT Curriculum

Data Science SESSION II

2 of 19

Data Science Part II

3 of 19

Presentation Outline

Discussion

  • ICT Curriculum
  • Introduction to Pandas

Activity

  • Series and DataFrame
  • Data Analysing Methods
  • Data Referencing & Cleaning Methods
  • Statistical Data Analysing Methods

Explain

  • Quick Review on Pandas Activity book.

4 of 19

ICT Curriculum

Objectives

  • Import Pandas lib
  • Explain structure of a DataFrame
  • Clean the dataset
  • Filter the dataset
  • Generate statistical analysis

Competency

Present a visual representation of a dataset by applying data analysis modules in a programming language to communicate a message.

Content Scope

  • Introduction to Pandas
  • Series & DataFrame
  • Data Analysing Methods
  • Data Referencing & Cleaning Methods
  • Statistical Data Analysing Methods

Career Opportunities

  • Data Analyst/freelance
  • Data Engineer/Scientist
  • In Bhutan too, people with data analysis skills are hired by large companies (consultant).

5 of 19

Introduction to Pandas

What?

Extremely versatile tool for manipulating dataset

Why?

    • Read external data
    • Data Cleaning
    • Data Extraction
    • Data Preparation for ML

Python Data Analysis Library

6 of 19

NumPy Vs Pandas

NumPy

Pandas

7 of 19

Getting Started with Pandas

In windows

In mac

  1. Open cmd
  2. Run the following command

pip install pandas

  1. Open terminal
  2. Run the following command

pip3 install pandas

Video Link

Pandas can be installed using the Python package manager pip

8 of 19

Pandas Series

Sample

Output

Problem

Write a python program to display at least 4 names of your family members or friends.

Python Code

9 of 19

Pandas DataFrame

Python� Code

Problem

Write a python program to display the population data of any five Dzongkhags. Include following information:

  • Dzongkhag
  • Population
  • Area

Sample Output

10 of 19

Series and DataFrame

Series

DataFrame

Store only one type of Data as in single column of table

Store various type of data in the form of rows and columns

Hold small data

Holds large data (External Data)

Datasets of Pandas are either stored as Series or DataFrame

11 of 19

Data Analysing Methods

info() → Displays a summary of the DataFrame

max() → Returns the maximum value

min() → Returns the minimum value

sort() → Returns the sorted DataFrame

describe() → Returns basic statistics for the numerical columns in the DataFrame

head() → Returns the first five rows of the DataFrame

head(N) → Returns the first N rows of the DataFrame

tail() → Returns the last five rows of the DataFrame

tail(N) → Returns the last N rows of the DataFrame

12 of 19

Data Analysing Methods (Activity 1)

Problem Statement

Write a python program to:

  • sort the data in descending order based on the Total column.
  • display data of top 10 countries.
  • display data of the bottom 5 countries.

Link to csv file

Sample Output

13 of 19

Data Analysing Methods (Activity 1)

Python Code

14 of 19

Data Referencing & Cleaning Methods

shape() Returns the number of rows and columns

notna() → Select non-null values

dropna() → Removes rows or columns with missing data

fillna() → Fills missing values with a specified value

isin() → Filter data based on elements

loc() → Access a group of rows and columns by labels

iloc() → Access a group of rows and columns by integer position

15 of 19

Data Referencing & Cleaning Methods (Activity 2)

Problem Statement

Write a python program using Pandas to collect all data with non-empty values under the ‘Calories’ column.

  • List the information of maximum calories burnt (consider 350 & above).
  • Display the dates only, when maximum calories were burnt.

Link to csv file

Sample Output

16 of 19

Statistical Data Analysing Methods

sum() Returns the sum of the values in a column or row

mean() → Returns the average of the values in a column or row

median() → Returns the median value of the values in a column or row

mode() → Returns the mode (most common value) of the values in a column or row

std() → Returns the standard deviation of the values in a column or row

corr() → Returns the correlation coefficients between values in a column or row

17 of 19

Statistical Data Analysing Methods (Activity 3)

Problem Statement

Write a python program to:

  • Find the total number of students in a column called ‘Total_Appeared’ to calculate how many appeared in each subject by adding ‘Female’ and ‘Male’ columns.
  • Find the total number of papers written by students.
  • Find the average of all the numeric data ('CA Pass' till 'Total_Appeared' column).

Link to csv file.

Sample Output

18 of 19

Pandas in Summary

Wes McKinney

(Twitter Photo)

Powerful and open-source library for data manipulation and data analysis in Python

Pandas Resources

GitHub Repository

YouTube Video

Pandas Official Site

Pandas Activity Book →

19 of 19