1 of 59

DATAFRAMES PPT

XII – IP –PYTHON

 

2022.23 Python Syllabus

 

Data Handling using Pandas and Data Visualization

(25 Marks)

2 of 59

Introduction to Python libraries- Pandas, Matplotlib.

Data structures in Pandas - Series and Data Frames.

Series: Creation of Series from – ndarray, dictionary, scalar value; mathematical operations;

Head and Tail functions; Selection, Indexing and Slicing.

Data Frames: Creation - from dictionary of Series, list of dictionaries, Text/CSV files; display; iteration; Operations on rows and columns: add, select, delete, rename; Head and Tail functions; Indexing using Labels, Boolean Indexing;

Importing/Exporting Data between CSV files and Data Frames.

Data Visualization: Purpose of plotting; drawing and saving following types of plots using Matplotlib – line plot, bar graph,Histogram.

Customizing plots: adding label, title, and legend in plots.

3 of 59

A DataFrame is a Pandas data structure, which stores data in two-dimensional way. It is an ordered collection of columns where columns may store different types of data e.g., numeric or floating point or string or Boolean type, etc.

Characteristics:

  • It has two indexes/axes.
  • Row index (axis=0) & Column index (axis=1).
  • Row index is known as index,
  • Column index is known as column name.
  • Indexes can be of numbers or letters or strings.
  • Different columns can have data of different types.
  • Value is mutable (ie its value can change)
  • We can add/delete rows/columns in a DataFrame ie size-mutable.

DATAFRAME

4 of 59

CREATING A DATAFRAME

Before creation, we need to import two modules.

import pandas (or) import pandas as pd

import numpy (or) import numpy as np

(In the place of pd or np, we can use any valid identifier)

Syntax:

<dataFrameObject>=pandas.DataFrame(

<a 2D datastructure>, [columns=<column sequence>],

[index=<index sequence>]).

 

We can create using:

  • Two-dimensional dictionaries ie dictionaries having lists or dictionaries or ndarrays or Series objects, etc.
  • Two-dimensional ndarrays (NumPy array)
  • Series type object
  • Another DataFrame object

Displaying a DataFrame is same as the way we display other variables and objects.

5 of 59

(i) Creating a DataFrame using a 2-D Dictionary:

A 2-D dictionary is a dictionary having items as (key:value), where value part is a data structure of any type i.e., another dictionary, an ndarray, a series object, a list, etc.

Value part of all the keys should have similar structure.

(a) Creating a dataframe from a 2D dictionary having values as lists:

>>>dict={'RNo':[51,52,53,54],'SName':

['Lahari','Chanakya','Harish','Neha'],

'Marks':[55,62,52,75]}

df=pd.DataFrame(dict)

 

Program to create a dataframe using 2-D Dictionary having values as lists:

import pandas as pd

dict={'RNo':[51,52,53,54],'SName':

['Lahari','Chanakya','Harish','Neha'],

'Marks':[55,62,52,75]}

df=pd.DataFrame(dict)

print(df)

output

By default, its index will be assigned 0 (zero) onwards.

Note : As per text book, the output columns will be placed in ascending order ie “Marks” then “RNo” then “SName” but practically, the output columns are displaying as per the entered order.

6 of 59

Specifying Own Index:

>>>df=pd.DataFrame(dict,index=['First','Second','Third','Fourth'])

Note: If the number of indexes does not match the index sequence, then “ValueError” will occur.

 Example :Given a dictionary that stores “State names” as index, “Mother Tongue” &“Population” as column names. Note: Population in crores.

Program:

import pandas as pd

dict={'Tother Tongue':['Telugu','Tamil','Hindi'],

'Population':[6,8,12]}

df=pd.DataFrame(dict,index=['AP','TN','Maharastra'])

print(df)

7 of 59

(b) Creating a dataframe from a 2D dictionary having values as ndarrays:

8 of 59

(c) Creating a dataframe from a 2D dictionary having values as dictionary object:

dict={'RNo':{'First':51,'Second':52,'Third':53,'Fourth':54},'SName':{'First':'Lahari','Second':

‘Chanakya','Third':'Harish','Fourth':'Neha'},'Marks':{'First':55,'Second':62,'Third':52,'Fourth':75}}

df=pd.DataFrame(dict)

dict={'First':{'RNo':51,'SName':'Lahari','Marks':55},

'Second':{'RNo':52,'SName':'Chanakya','Marks':62},

'Third':{'RNo':53,'SName':'Harish','Marks':52},

'Fourth':{'RNo':54,'SName':'Neha','Marks':75}}

df=pd.DataFrame(dict)

9 of 59

Special Condition:

Two dictionaries with dissimilar keys as inner dictionaries of a 2D dictionary. For this DataFrame can be created with non-matching inner keys.

All the inner keys become indexes, NaN values will be added for non-matching keys of inner dictionaries.

Program:

import pandas as pd

C1={'Qty':95,'Half Yearly':89}

C2={'Half Yearly':94,'Annual':97}

Marks={'Student 1':C1,'Student 2':C2}

df=pd.DataFrame(Marks)

print(df)

OUTPUT

10 of 59

(ii) Creating a Dataframe Object from a List of Dictionaries/Lists:

(a) Creating a Dataframe using a list having List of dictionaries :

  If we pass a 2D list having dictionaries as its elements (list of dictionaries) to pandas.DataFrame() function, it will create a DataFrame object such that the inner dictionary keys will become the columns and inner dictionary’s values will make rows.

Ex:

import pandas as pd

dict1={'RNo':51,'SName':'Lahari','Marks':55}

dict2={'RNo':52,'SName':'Chanakya','Marks':62}

dict3={'RNo':53,'SName':'Harish','Marks':52}

dict4={'RNo':54,'SName':'Neha','Marks':75}

students=[dict1,dict2,dict3,dict4]

df=pd.DataFrame(students)

print(df)

11 of 59

Note : We can also include indexes as follows:

df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])

Note: If we do not give the same column name in every row, it will com “NaN” values.

Program:

import pandas as pd

dict1={'RNo':51,'SName':'Lahari','Marks':55}

dict2={'RNo':52,'Name':'Chanakya','Marks':62}

dict3={'RNo':53,'Name':'Harish','Marks':52}

dict4={'RNo':54,'SName':'Neha','Marks':75}

students=[dict1,dict2,dict3,dict4]

df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])

print(df)

OUTPUT

12 of 59

(b) Creating using a list having List of lists:

lists=[[10,20,40],['A','B','C','D'],[33.5,55.75,2.5]]

df=pd.DataFrame(lists)

Inserting Rows & Column Names:

import pandas as pd

lists=[[51,'Lahari',55],[52,'Chanakya',62],[53,'Harish',52]]

#each inner list is a row

df=pd.DataFrame(lists,columns=['RNo','SName','Marks'],index=['First','Second','Third'])

print(df)

13 of 59

(iii) Creating a dataframe Object from a 2-D ndarray:

We can pass a two-dimensional Numpy array (ie having shape as (<n>,<n>) to DataFrame( ) to create a dataframe object.

 Consider the program to create np array:

import numpy as np

import pandas as pd

narr=np.array([[10,20,30],[40,50,60]],np.int32)

print(narr)

Program:

import numpy as np

import pandas as pd

narr=np.array([[10,20,30],[40,50,60]],np.int32)

mydf=pd.DataFrame(narr)

print(mydf)

Output

[[1020 30]

[405060]]

OUTPUT

14 of 59

narr=np.array([[10.7,20.5],[40,50],[25.2,55]])

mydf=pd.DataFrame(narr,columns=["One","Two"],index=['A','B','C'])

print(mydf)

We can specify either columns or index or both the sequences.

Note : If, the rows of ndarrays differ in length, i.e., if number of elements in each row differ, then Python will create just single column in the dataframe object and the type of the column will be considered as object.

Example:

narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]],

dtype="object")

narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]],

dtype="object")

Output

[list([10.7, 20.5, 30.2]) list([40, 50]) list([25, 55, 11, 45])]

15 of 59

Program:

narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]], dtype="object")

mydf=pd.DataFrame(narr) Output

(iv) Creating a dataframe Object from a 2D

Dictionary with Values as Series Objects:

import pandas as pd

RN=pd.Series([11,12,13,14])

SN=pd.Series(['Rajesh','Likhith','Navya','Bhavya'])

M=pd.Series([56,75,91,82])

studict={'RNo':RN,'SName':SN,'Marks':M}

mydf=pd.DataFrame(studict)

print(mydf)

Output

16 of 59

(v) Creating a dataframe Object from a 2D Dictionary with Values as Series Objects:

DF

Program:

import pandas as pd

dict={'RNo':[51,52,53,54],'SName':['Lahari','Chanakya',

'Harish','Neha'],'Marks':[55,62,52,75]}

df=pd.DataFrame(dict)

dfnew=pd.DataFrame(df)

print(dfnew)

OUTPUT

(new DataFrame created from existing DataFrame)

17 of 59

DATAFRAME ATTRIBUTES

All information related to a DataFrame such as its size, datatype, etc is available through its attributes.

Syntax to use a specific attribute:

<DataFrame object>.<attribute name>

Attribute

Description

index

The index (row labels) of the DataFrame

columns

The column labels of the DataFrame

axes

It returns axis 0 i.e., index and axis 1 i.e., columns of the DataFrame

dtypes

Return the data types of data in the DataFrame

size

Return an int representing the number of elements in this object

shape

Return a tuple representing the dimensionality of the DataFrame i., (no.of rows, no.of columns)

values

Return a Numpy representation of the DataFrame

empty

Indicats whether DataFrame is empty

ndim

Return an int representing the number of axes/array dimensions.

T

Transpose

18 of 59

Example of a DataFrame DF:

Retrieving various properties of a DataFrame Object:

>>>df.index

Index(['First', 'Second', 'Third', 'Fourth'], dtype='object')

(for default indexes)

>>>df.index #above example

RangeIndex(start=0, stop=3, step=1)

>>> df.columns

Index(['RNo', 'SName', 'Marks'], dtype='object')

>>>df.axes

[Index(['First', 'Second', 'Third', 'Fourth'], dtype='object'), Index(['RNo', 'SName', 'Marks'], dtype='object')]

>>>df.dtypes

RNo int64

SName object

Marks int64

dtype: object

>>>df.size#4 rows X 3 columns

12

>>>df.shape #(no.of rows, no.of columns)

(4, 3)

>>>df.values# Numpy representation

[ [51 'Lahari' 55]

[52 'Chanakya' 62]

[53 'Harish' 52]

[54 'Neha' 75] ]

19 of 59

>>>df.empty

#if DataFrame is empty, gives True

False

>>>df.ndim # As DataFrame is a 2 Dimensional

2

>>>df.T

#Transpose. Rows will become columns and vice versa.

Example of a DataFrame DF:

Function

Description

len(<DF Object>)

Return the number of rows in a dataframe

(<DF Object>. count( )

If we pass any argument or 0 (default is 0), it returns count of non-NA values for each column, if it is 1, it returns count of non-NA values for each row.

OTHERS

20 of 59

>>>len(df)

4

>>>df.count( )

#df.count(0)or df.count(axis=’index’)

RNo 4

SName 4

Marks 4

dtype: int64

>>>df.count(1) # df.count(axis=’columns’)

First 3

Second 3

Third 3

Fourth 3

dtype: int64

>>>df.shape[0]# to get number of rows

4

>>>df.shape[1]# to get number of columns

3

21 of 59

SELECTING/ACCESSING DATA

Selecting/Accessing a Column:

Syntax:<DataFrame object> [<column name>]

(or)<DataFrame object>.<column name>

>>>df['Private']

AP 100

TN 98

TS 110

Name: Private, dtype: int64

>>>df.Aided

AP 75

TN 92

TS 85

Name: Aided, dtype: int64

Selecting/Accessing Multiple Columns:

<DataFrame object>[ [<column name>,

<column name>,…..] ]

>>>df[['ZP','Aided','Govt']]

22 of 59

Selecting/Accessing a subset from a DataFrame using Row/Column Names using loc function:

To access row(s) and/or a combination of rows and columns, we can use loc function.

Syntax: <DataFrame Object>.loc[<startrow>:<endrow>,<startcolumn>:<endcolumn>]

To access one row (loc)

Just give the row name/label.

>>>df.loc['AP',:] #df.loc['AP'] or df.loc['AP',]

Private 100

Aided 75

Govt 125

ZP 89

Name: AP, dtype: int64

 

>>>df.loc['Maharastra',:]

KeyError: 'Maharastra'

23 of 59

To access multiple rows (loc):

>>>df.loc['AP':'TS',] # df.loc['AP':'TS'] or df.loc['AP':'TS',:]

>>>df.loc[['TS','AP']] #df.loc[['TS','AP'],] ordf.loc[['TS','AP'],:]

>>>df.loc[['TN','TS','AP']]

>>>df.loc['TN':'AP']

Empty DataFrame

Columns: [Private, Aided, Govt, ZP]

Index: [ ]

24 of 59

To access a column using loc:

>>>df.loc[:,'Private']

AP 100

TN 98

TS 110

Name: Private, dtype: int64

 

To access multiple columns using loc:

>>>df.loc[:,'Private':'Govt']

Note: All columns between start and end columns are listed. 

>>>df.loc[:,'Aided':]

>>>df.loc[:,'Govt':'Private']

Empty DataFrame

Columns: [ ]

Index: [AP, TN, TS] 

>>>df.loc[:,['Govt','Private','ZP']]

25 of 59

To access range of columns from a range of rows:

  <DF Object>.loc[<startrow>:<endrow>,<startcolumn>:<endcolumn>]

 

>>>df.loc['AP':'TN','Aided':'ZP']

>>> df.loc['TN':'AP','Aided':'ZP']

Empty DataFrame

Columns: [Aided, Govt, ZP]

Index: [ ]

 

>>> df.loc['AP':'TS','ZP':'Aided']

Empty DataFrame

Columns: [ ]

Index: [AP, TN, TS]

26 of 59

Selecting/Accessing a subset from a DataFrame using Row/Column Names using iloc function:

With this function, we can extract, subset from dataframe using the row and column numeric index/position. iloc means integer location.

Syntax:

<DF Object>.iloc[<start row index>:<end row index>, <start col index>:<end column index>]

When we use iloc, then <startindex>:<endindex> given for rows and columns work like slices, and the end index is excluded.

Note: With loc, both start label and end label are included when given as start:end, but with iloc, like slices end index/position is excluded when given as start:end.

To access one row (iloc):

>>>df.iloc[0] #df.iloc[0,] or df.iloc[0,:]

>>>df.iloc[2]

>>>df.iloc[4]

IndexError: single positional indexer is out-of-bounds

27 of 59

To access multiple rows (iloc):

>>>df.iloc[0:2] # df.iloc[0:2,] ordf.iloc[0:2,:]

>>>df.iloc[0:55]

>>>df.iloc[1:20]

>>>df.iloc[1:1]

Empty DataFrame

Columns: [Private, Aided, Govt, ZP]

Index: [ ]

>>>df.iloc[[1,2]] # df.iloc[[1,2], ] ordf.iloc[[1,2], : ]

>>>df.iloc[[2,0,1]]

28 of 59

To access a column using iloc:

>>>df.iloc[:,1]

>>>df.iloc[:,[1]]

To access multiple columns using loc:

>>>df.iloc[:,1:3] #Excluding column 3

>>>df.iloc[:,[2,0]]

>>>df.iloc[:,1:]

>>>df.iloc[:,2:0]

Empty DataFrame

Columns: []

Index: [AP, TN, TS]

>>>df.iloc[:,[2,0,1]]

29 of 59

To access range of columns from a range of rows (iloc):

>>>df.iloc[1:3,0:2]#Rows 1,2 & Columns 0,1

>>>df.iloc[[1,2],[0,1,2]]

>>>df.iloc[2:2,0:2]

Empty DataFrame

Columns: [Private, Aided]

Index: []

>>>df.iloc[1:3,2:0]

Empty DataFrame

Columns: []

Index: [TN, TS]

>>>df.iloc[[1,3],0:2]

IndexError: positional indexers are out-of-bounds

30 of 59

Selecting or Accessing Individual Value:

(i) Either give name of row or numeric index in square brackets.

Syntax:<DF Object>.<column> [<row name or row numeric index>]

Ex:>>>df.Govt['AP']

125

# df.Govt['AP']=200 will change the value to 200

>>>df.Govt[2]

110

(ii) .at function: Access a single value for a row/column label pair by labels.

Syntax:<DF Object>.at[<row label>,<col label>]

Ex: >>> df.at['AP','Aided']

75

# df.at['AP','Aided']=500 will change the value to 500

>>> df.at['Meghalaya','Aided']

KeyError: 'Meghalaya'

>>> df.at['AP','Orissa']

KeyError: 'Orissa'

31 of 59

(iii).iat function: Access a single value for a row/column label pair by index position.

Syntax: <DF Object>.at[<row index no><col index no>]

>>>df.iat[1,0]

98

# df.iat[1,0]=777 will change the value to 777

ADDING/MODIFYING ROWS/COLUMNS

The process of adding and modifying rows/columns value is similar.

32 of 59

Adding/Modifying a Row:

We can change or add rows to a DataFrame using at or loc attributes.

at:<DF Object>.at[<row name>,:]=<new value>

If there is a row with the given row label, it changes the values.

>>> df.at['AP']=123

>>>df

>>> df.at['TN',:]=[200,300,400,500]

>>>df

>>> df.at['TS',:]=[10,11,12]

ValueError: could not broadcast input array from shape (3,) into shape (4,)

 

If there is no row with such row label, adds new row with this row label and assigns given values to all its columns.

When you add a new row with at function, data will becomes float.

>>> df.at['MP']=[111,222,333,444]

>>>df.at[1]=[20,21,22,23]

33 of 59

>>>df

loc( ):

>>>df.loc['AP',:]=[300,301,302,303]

>>>df.loc['TN',:]=401,402,403,404

>>>df

>>>df.loc['MP',:]=501,502,503,504

>>>df

>>>df.loc['Orissa',:]=601,602,603

ValueError: could not broadcast input array from shape (3,) into shape (4,)

iloc( ):

>>>df.iloc[1]=[1,2,3,4]

>>>df

>>>df.iloc[2,:]=[5,6,7,8]

>>>df.iloc[[0,1]]=[[100,200,300,400],[11,22,33,44]]

>>>df

34 of 59

Adding/Modifying a Column:

Assigning a value to a column:

  • Will modify it, if the column already exists
  • Will add a new column, if it does not exist already.

Syntax:

<DF Object>.<column name>=<new value>

(or)<DF Object>[<column>]=<new value>

Let us consider our original dataframe:

>>>df['ZP']=99

>>>df

>>>df['Corporate']=[44,55,66]

>>>df

Let us consider our original dataframe:

>>>df.Aided=11,22,33 #df.Aided=[11,12,33]

>>>df

Note :If we give following,

>>>df.corporate=11,12,13 or

>>>df.corporate=[11,12,13],

No error will be displayed, but nothing will be stored in DataFrame

35 of 59

Other ways to add columns:

<DF object>.at[:,<column name>]=<values for column>

  <DF object>.loc[:,<column name>]=<values for column>

>>>df.loc[:,'Govt']=200

>>>df.loc[:,'ZP']=300,400,500

>>>df.loc[:,'Cor']=11,12,13

>>>df.loc[:,'Private']=[10,11,12]

iloc( ):

>>>df.iloc[:,3]=1200

>>>df.iloc[:,1:3]=[[1,2],[3,4],[5,6]]

>>>df

36 of 59

<DF object>=<DF object>.assign(<column name>=<values for column>)

>>>df=df.assign(Private=[10,11,12])

>>>df=df.assign(Corporate=[33,34,35])

>>>df

>>> df2=df.assign(Aided=777)

>>> df2

>>>df=df.assign(New=[55,56])

ValueError: Length of values (2) does not match length of index (3)

37 of 59

Modifying a Single Cell :

(ii)>>> df.at['TN','Govt']=999

>>> df

>>> df.Govt[2]=777

(ii)>>> df.at['TN','Govt']=999

>>> df

>>> df.at['MP','ZP']

KeyError: 'MP'

(iii)>>> df.iat[0,3]=2022

>>> df

(iv)>>> df.loc['TS','ZP']=555

>>> df

(v) >>> df.iloc[2,0]=333

>>> df

38 of 59

RENAMING ROWS/COLUMNS

To change the name of any row/column individually, we can use the rename( ) function.

rename( ) function by default does not make changes in the original dataframe. It creates a new dataframe with the changes and the original dataframe remains unchanged.

Syntax:

<DF>.rename(index={<names dictionary>},

columns={<names dictionary>}, inplace=False)

>>> df.rename(index={'TN':'Tamil Nadu',

'AP':'Andhra Pradesh'},inplace=True)

>>> df

>>>df.rename(columns={'Private':'Personal',

'ZP':'Zilla Parishad'},inplace=True)

>>> df

Another Example:

>>> df.rename(index={'Ben':'Benches','Tab':'Tables'},

columns={'A':'Sec A','B':'Sec B','C':'Sec C'})

39 of 59

>>> df

Note : If we do not add “inplace=True”, when we are executing the commands only, it will show the modified values. But really it won’t modifies the values. So to modify values we need to add “inplace=True”.

(inplace=True performs the drop operation in the same dataframe)

>>> df.rename(index={'Ben':'Benches','Tab':'Tables'},

columns={'A':'Sec A','B':'Sec B','C':'Sec C'},inplace=True)

>>> df

40 of 59

DELETING ROWS/COLUMNS

Two ways to delete rows and columns

– del( ) and drop( )

(i) Delete row(s) using drop( ) function:

Syntax:<DF>.drop(index or sequence of indexes)

>>> df.drop(["TS","AP"],inplace=True)

or

>>> df.drop(["TS","AP"],axis=0,inplace=True)

>>> df

(ii) Delete a column, using drop( ) function:

>>> df.drop(['Private','Aided','ZP'],axis=1,inplace=True)

>>> df

>>> df.drop(["Aided"],axis=1,inplace=True)

>>> df

(iii) Delete a column, using del( ) function:

(Other examples:

df.drop(range(2,15,3)) – 2,5,8,11,14

df.drop([2,4,6,8,12])

Argument to drop( ) should be either an index, or a sequence containing indexes.)

Syntax: del <DF object>[<column name>]

>>> del df['ZP']

>>> df

41 of 59

BOOLEAN INDEXING

Boolean Indexing, refers to having Boolean Values [(True or False) or (1 or 0) sometimes] as indexes of a dataframe.

The Boolean indexes divide the dataframe in two groups – True rows and False rows.

In some situations, we may need to divide our data in two subsets – True or False, e.g., your school has decided to launch online classes for you. But some days of the week are designated for it. So, a dataframe related to this information might look like:

 

Day

No.of Classes

True

Monday

5

False

Tuesday

0

True

Wednesday

3

False

Thursday

4

True

Friday

7

True

Saturday

2

Creating DataFrames with Boolean Indexing:

While creating a dataframe with Boolean indexes True and False, we should not enclosed in quotes (Otherwise, KeyError will generates)

 

Create a dataframe containing online classes information, through the code :

import pandas as pd

Days=['Mon','Tue','Wed','Thu','Fri','Sat']

Classes = [5,0,3,4,7,2]

dict={'Days':Days,'No.of Classes':Classes}

df=pd.DataFrame(dict,

index=[True,False,True,False,True,True])

print(df)

This is useful division in situations where we find out things like – On which days, the online classes are held? Or which ones are offline classes days? And so on.

42 of 59

We can also provide Boolean indexing to dataframes as 1s and 0s.

df=pd.DataFrame(dict,index=[1,0,1,0,1,1])

Accessing Rows from DataFrames with Boolean Indexing:

Boolean indexing is very useful for filtering records i.e., for finding or extracting the True or False indexed rows.

<DF>.loc[True]

Display all records with True index

<DF>.loc[False]

Display all records with False index

<DF>.loc[1]

Display all records with index as 1

<DF>.loc[0]

Display all records with index as 0

Ex:

Days=['Mon','Tue','Wed','Thu','Fri','Sat']

Classes = [5,0,3,4,7,2]

dict={'Days':Days,'No.of Classes':Classes}

df=pd.DataFrame(dict,

index=[True,False,True,False,True,True])

>>>df.loc[True] #df.loc[1]

>>>df.loc[False] #df.loc[0]

43 of 59

PYTHON PANDAS 2

  • Iterating Over a DataFrame
  • Binary Operations in a Dataframe

44 of 59

PYTHON PANDAS 2 – Iterating Over a DataFrame

 

Teachers

Students

Area

Ratio

Private

30

290

Urban

9.66

Govt

18

185

Rural

10.27

Aided

15

120

Rural

8

CBSE

35

325

Urban

9.28

ICSE

25

260

Urban

10.4

iterrows( ) : This method iterates over dataframe row wise where each horizontal subset is in the form of (row-index,Series) where Series contains all column values for that row-index

>>> dict={'Teachers':[30,18,15,35,25],'Students':[290,185,120,325,260],'Area':['Urban','Rural', \

'Rural','Urban','Urban'], 'Ratio':[9.66,10.27,8,9.28,10.4]}

>>> DF=pd.DataFrame(dict,index=['Private','Govt','Aided','CBSE','ICSE'])

45 of 59

Example : Using iterrows( ) to extract data from dataframe row wise.

import pandas as pd

dict={'Teachers':[30,18,15,35,25],\

'Students':[290,185,120,325,260],\

'Area':['Urban','Rural','Rural','Urban','Urban'],\

'Ratio':[9.66,10.27,8,9.28,10.4]}

DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\

'CBSE','ICSE'])

for (row,rowSeries) in DF.iterrows():

print("Row index:",row)

print("Containing: ")

print(rowSeries)

OUTPUT

Row index: Private

Containing:

Teachers 30

Students 290

Area Urban

Ratio 9.66

Name: Private, dtype: object

Row index: Govt

Containing:

Teachers 18

Students 185

Area Rural

Ratio 10.27

Name: Govt, dtype: object

--------

--------

46 of 59

Example : Using iterrows( ) to extract row-wise Series objects

import pandas as pd

dict={'Teachers':[30,18,15,35,25],\

'Students':[290,185,120,325,260],\

'Area':['Urban','Rural','Rural','Urban','Urban'],\

'Ratio':[9.66,10.27,8,9.28,10.4]}

DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\

'CBSE','ICSE'])

for (row,rowSeries) in DF.iterrows():

print("Row index:",row)

print("Containing: ")

i=0

for val in rowSeries:

print("At",i,"position: ",val)

i=i+1

OUTPUT

Row index: Private

Containing:

At 0 position: 30

At 1 position: 290

At 2 position: Urban

At 3 position: 9.66

Row index: Govt

Containing:

At 0 position: 18

At 1 position: 185

At 2 position: Rural

At 3 position: 10.27

Row index: Aided

-----

-----

47 of 59

iter items( ) : This method iterates over dataframe column wise where each vertical subset is in the form of (col-index,Series) where Series contains all row values for that column-index

48 of 59

Example : Using iteritems( ) to extract data from dataframe column wise.

import pandas as pd

dict={'Teachers':[30,18,15,35,25],\

'Students':[290,185,120,325,260],\

'Area':['Urban','Rural','Rural','Urban','Urban'],\

'Ratio':[9.66,10.27,8,9.28,10.4]}

DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\

'CBSE','ICSE'])

for (col,colSeries) in DF.iteritems():

print("Column index:",col)

print("Containing: ")

print(colSeries)

OUTPUT

Column index: Teachers

Containing:

Private 30

Govt 18

Aided 15

CBSE 35

ICSE 25

Name: Teachers, dtype: int64

Column index: Students

Containing:

Private 290

Govt 185

Aided 120

CBSE 325

ICSE 260

Name: Students, dtype: int64

--------

--------

--------

49 of 59

Example : Using iteritems( ) to extract dataframe column wise series object

import pandas as pd

dict={'Teachers':[30,18,15,35,25],\

'Students':[290,185,120,325,260],\

'Area':['Urban','Rural','Rural','Urban','Urban'],\

'Ratio':[9.66,10.27,8,9.28,10.4]}

DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\

'CBSE','ICSE'])

for (col,colSeries) in DF.iteritems():

print("Column index:",col)

print("Containing: ")

i=0

for val in colSeries:

print("At row ",i,":",val)

i=i+1

OUTPUT

Column index: Teachers

Containing:

At row 0 : 30

At row 1 : 18

At row 2 : 15

At row 3 : 35

At row 4 : 25

Column index: Students

Containing:

At row 0 : 290

At row 1 : 185

At row 2 : 120

At row 3 : 325

At row 4 : 260

Column index: Area

--------

--------

--------

50 of 59

Write a program to print the DataFrame DF, one row at a time

import pandas as pd

dict={'Teachers':[30,18,15,35,25],\

'Students':[290,185,120,325,260],\

'Area':['Urban','Rural','Rural','Urban','Urban'],\

'Ratio':[9.66,10.27,8,9.28,10.4]}

DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\

'CBSE','ICSE'])

for i,j in DF.iterrows():

print(i)

print(j)

print("____________")

OUTPUT

Private

Teachers 30

Students 290

Area Urban

Ratio 9.66

Name: Private, dtype: object

____________

Govt

Teachers 18

Students 185

Area Rural

Ratio 10.27

Name: Govt, dtype: object

____________

--------

51 of 59

Write a program to print the DataFrame DF, one column at a time

import pandas as pd

dict={'Teachers':[30,18,15,35,25],\

'Students':[290,185,120,325,260],\

'Area':['Urban','Rural','Rural','Urban','Urban'],\

'Ratio':[9.66,10.27,8,9.28,10.4]}

DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\

'CBSE','ICSE'])

for i,j in DF.iteritems():

print(i)

print(j)

print("____________")

OUTPUT

Teachers

Private 30

Govt 18

Aided 15

CBSE 35

ICSE 25

Name: Teachers, dtype: int64

____________

Students

Private 290

Govt 185

Aided 120

CBSE 325

ICSE 260

Name: Students, dtype: int64

_____________

--------

52 of 59

Putting Individual columns from a row:

When accessing rows of a DataFrame using iterrows(), then by using rowSeries[<column>], you can print individual column value from that row

Ie.,after the line,

for r, Row in df.iterrows( ):

You can print individual column value as :

Row[<column name>]

53 of 59

Write a program to print only the values from Teachers column, for each row

import pandas as pd

dict={'Teachers':[30,18,15,35,25],\

'Students':[290,185,120,325,260],\

'Area':['Urban','Rural','Rural','Urban','Urban'],\

'Ratio':[9.66,10.27,8,9.28,10.4]}

DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\

'CBSE','ICSE'])

for row,rowSeries in DF.iterrows():

print(rowSeries['Teachers'])

print("------")

OUTPUT

30

------

18

------

15

------

35

------

25

------

54 of 59

BINARY OPERATIONS IN A DATAFRAME

Binary operations mean operations requiring two values to perform and these values are picked element wise. In a binary operation, the data from two dataframes are aligned on the bases of their row and column indexes and for the matching row, column index, the given operation is performed and for the nonmatching row, column index NaN value is stored in the result.

Data is aligned in two dataframes, the data is aligned on the basis of matching row and column indexes and then arithmetic is performed for non-overlapping indexes, the arithmetic operations result as a NaN for non-matching indexes.

Binary Operations: addition, subtraction, multiplication, division

Example Data Frames : DF1, DF2, DF3, DF4

DF1

DF2

DF3

DF4

55 of 59

DF1

DF2

DF3

DF4

Program to create the dataframes DF1, DF2, DF3, DF4

import pandas as pd

dict1={'A':[11,17,23],'B':[13,19,25],'C':[15,21,27]}

DF1=pd.DataFrame(dict1)

dict2={'A':[12,18,24],'B':[14,20,26],'C':[16,22,28]}

DF2=pd.DataFrame(dict2)

dict3={'A':[1,3,5],'B':[2,4,6]}

DF3=pd.DataFrame(dict3)

dict4={'A':[7,9],'B':[8,10]}

DF4=pd.DataFrame(dict4)

56 of 59

DF1

DF2

DF3

DF4

Addition : [ Using +, add( ), radd( ) ]

Note : DF1.add(DF2) is equal to DF1+DF2

DF1.radd(DF2) is equal to DF2+DF1

radd( ) means reverse addition

>>>DF1+DF2 #DF1.add(DF2)

>>>DF1+DF3

>>>DF1+DF4

>>>DF2+DF3

>>>DF2+DF4

>>>DF3+DF4

>>>DF3.add(DF4)

57 of 59

DF1

DF2

DF3

DF4

Subtraction: [ Using -, sub( ), rsub( ) ]

 

Note : DF1.sub(DF2) is equal to DF1-DF2

DF1.rsub(DF2) is equal to DF2-DF1

rsub( ) means reverse subtraction

>>>DF1-DF2

>>>DF3-DF1

>>>DF1-DF4

>>>DF4-DF1

>>>DF2-DF3

>>>DF3-DF2

>>>DF2-DF4

>>>DF4-DF2

>>>DF3-DF4

#DF3.sub(DF4)

>>>DF4-DF3

#DF3.rsub(DF4)

58 of 59

DF1

DF2

DF3

DF4

Multiplication: [ Using *, mul( ), rmul( ) ]

 

Note : DF1.mul(DF2) is equal to DF1*DF2

DF1.rmul(DF2) is equal to DF2*DF1

rmul( ) means reverse multiplication

>>>DF1*DF2

>>>DF1*DF3

>>>DF1*DF4

>>>DF2*DF3

>>>DF2*DF4

>>>DF3*DF4

59 of 59

DF1

DF2

DF3

DF4

Division: [ Using /, div( ), rdiv( )

Note : DF1.div(DF2) is equal to DF1/DF2

DF1.rdiv(DF2) is equal to DF2/DF1

rdiv( ) means reverse division.

>>>DF1/DF2

>>>DF2/DF1

>>>DF1/DF3

>>>DF3/DF1

>>>DF1/DF4

>>>DF2/DF3

>>>DF3/DF2

>>>DF2/DF4

>>>DF4/DF2

>>>DF3/DF4

>>>DF4/DF3