3 of 91

A DataFrame is a Pandas data structure, which stores data in two-dimensional way. It is an ordered collection of columns where columns may store different types of data e.g., numeric or floating point or string or Boolean type, etc.

Characteristics:

It has two indexes/axes.
Row index (axis=0) & Column index (axis=1).
Row index is known as index,
Column index is known as column name.
Indexes can be of numbers or letters or strings.
Different columns can have data of different types.
Value is mutable (ie its value can change)
We can add/delete rows/columns in a DataFrame ie size-mutable.

DATAFRAME - INTRODUCTION

4 of 91

DATAFRAMES - CREATION

5 of 91

CREATING A DATAFRAME

Before creation, we need to import two modules.

import pandas (or) import pandas as pd

import numpy (or) import numpy as np

(In the place of pd or np, we can use any valid identifier)

Syntax:

<dataFrameObject>=pandas.DataFrame(

<a 2D datastructure>, [columns=<column sequence>],

[index=<index sequence>]).

We can create using:

Two-dimensional dictionaries ie dictionaries having lists or dictionaries or ndarrays or Series objects, etc.
Two-dimensional ndarrays (NumPy array)
Series type object
Another DataFrame object

Displaying a DataFrame is same as the way we display other variables and objects.

6 of 91

(i) Creating a DataFrame using a 2-D Dictionary:

A 2-D dictionary is a dictionary having items as (key:value), where value part is a data structure of any type i.e., another dictionary, an ndarray, a series object, a list, etc.

Value part of all the keys should have similar structure.

(a) Creating a dataframe from a 2D dictionary having values as lists:

>>>dict={'RNo':[51,52,53,54],'SName': ['Lahari','Chanakya','Harish','Neha'], 'Marks':[55,62,52,75]}

df=pd.DataFrame(dict)

Program to create a dataframe using 2-D Dictionary having values as lists:

import pandas as pd

dict={'RNo':[51,52,53,54],'SName':

['Lahari','Chanakya','Harish','Neha'],

'Marks':[55,62,52,75]}

df=pd.DataFrame(dict)

print(df)

output

By default, its index will be assigned 0 (zero) onwards.

Note : As per text book, the output columns will be placed in ascending order ie “Marks” then “RNo” then “SName” but practically, the output columns are displaying as per the entered order.

7 of 91

Specifying Own Index:

>>>df=pd.DataFrame(dict,index=['First','Second','Third','Fourth'])

Note: If the number of indexes does not match the index sequence, then “ValueError” will occur.

Example :Given a dictionary that stores “State names” as index, “Mother Tongue” &“Population” as column names. Note: Population in crores.

Program:

import pandas as pd

dict={'Tother Tongue':['Telugu','Tamil','Hindi'],

'Population':[6,8,12]}

df=pd.DataFrame(dict,index=['AP','TN','Maharastra'])

print(df)

8 of 91

dict={'RNo':{'First':51,'Second':52,'Third':53,'Fourth':54},'SName':{'First':'Lahari','Second':

‘Chanakya','Third':'Harish','Fourth':'Neha'},'Marks':{'First':55,'Second':62,'Third':52,'Fourth':75}}

df=pd.DataFrame(dict)

dict={'First':{'RNo':51,'SName':'Lahari','Marks':55},

'Second':{'RNo':52,'SName':'Chanakya','Marks':62},

'Third':{'RNo':53,'SName':'Harish','Marks':52},

'Fourth':{'RNo':54,'SName':'Neha','Marks':75}}

df=pd.DataFrame(dict)

9 of 91

Special Condition:

Two dictionaries with dissimilar keys as inner dictionaries of a 2D dictionary. For this DataFrame can be created with non-matching inner keys.

All the inner keys become indexes, NaN values will be added for non-matching keys of inner dictionaries.

Program:

import pandas as pd

C1={'Qty':95,'Half Yearly':89}

C2={'Half Yearly':94,'Annual':97}

Marks={'Student 1':C1,'Student 2':C2}

df=pd.DataFrame(Marks)

print(df)

OUTPUT

10 of 91

(ii) Creating a Dataframe Object from a List of Dictionaries/Lists:

(a) Creating a Dataframe using a list having List of dictionaries :

If we pass a 2D list having dictionaries as its elements (list of dictionaries) to pandas.DataFrame() function, it will create a DataFrame object such that the inner dictionary keys will become the columns and inner dictionary’s values will make rows.

Ex:

import pandas as pd

dict1={'RNo':51,'SName':'Lahari','Marks':55}

dict2={'RNo':52,'SName':'Chanakya','Marks':62}

dict3={'RNo':53,'SName':'Harish','Marks':52}

dict4={'RNo':54,'SName':'Neha','Marks':75}

students=[dict1,dict2,dict3,dict4]

df=pd.DataFrame(students)

print(df)

11 of 91

Note : We can also include indexes as follows:

df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])

Note: If we do not give the same column name in every row, it will com “NaN” values.

Program:

import pandas as pd

dict1={'RNo':51,'SName':'Lahari','Marks':55}

dict2={'RNo':52,'Name':'Chanakya','Marks':62}

dict3={'RNo':53,'Name':'Harish','Marks':52}

dict4={'RNo':54,'SName':'Neha','Marks':75}

students=[dict1,dict2,dict3,dict4]

df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])

print(df)

OUTPUT

12 of 91

(b) Creating using a list having List of lists:

lists=[[10,20,40],['A','B','C','D'],[33.5,55.75,2.5]]

df=pd.DataFrame(lists)

Inserting Rows & Column Names:

import pandas as pd

lists=[[51,'Lahari',55],[52,'Chanakya',62],[53,'Harish',52]]

#each inner list is a row

df=pd.DataFrame(lists,columns=['RNo','SName','Marks'],index=['First','Second','Third'])

print(df)

13 of 91

(iii) Creating a dataframe Object from a 2-D ndarray:

We can pass a two-dimensional Numpy array (ie having shape as (<n>,<n>) to DataFrame( ) to create a dataframe object.

Consider the program to create np array:

import numpy as np

import pandas as pd

narr=np.array([[10,20,30],[40,50,60]],np.int32)

print(narr)

Program:

import numpy as np

import pandas as pd

narr=np.array([[10,20,30],[40,50,60]],np.int32)

mydf=pd.DataFrame(narr)

print(mydf)

Output

[[1020 30]

[405060]]

OUTPUT

14 of 91

narr=np.array([[10.7,20.5],[40,50],[25.2,55]])

mydf=pd.DataFrame(narr,columns=["One","Two"],index=['A','B','C'])

print(mydf)

We can specify either columns or index or both the sequences.

Note : If, the rows of ndarrays differ in length, i.e., if number of elements in each row differ, then Python will create just single column in the dataframe object and the type of the column will be considered as object.

Example:

narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]], dtype="object")

narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]],dtype="object")

Output

[list([10.7, 20.5, 30.2]) list([40, 50]) list([25, 55, 11, 45])]

15 of 91

Program:

narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]], dtype="object")

mydf=pd.DataFrame(narr) Output

(iv) Creating a dataframe Object from a 2D

Dictionary with Values as Series Objects:

import pandas as pd

RN=pd.Series([11,12,13,14])

SN=pd.Series(['Rajesh','Likhith','Navya','Bhavya'])

M=pd.Series([56,75,91,82])

studict={'RNo':RN,'SName':SN,'Marks':M}

mydf=pd.DataFrame(studict)

print(mydf)

Output

16 of 91

(v) Creating a dataframe Object from a 2D Dictionary with Values as Series Objects:

Program:

import pandas as pd

dict={'RNo':[51,52,53,54],'SName':['Lahari','Chanakya',

'Harish','Neha'],'Marks':[55,62,52,75]}

df=pd.DataFrame(dict)

dfnew=pd.DataFrame(df)

print(dfnew)

OUTPUT

(new DataFrame created from existing DataFrame)

17 of 91

DATAFRAME - ATTRIBUTES

18 of 91

DATAFRAME ATTRIBUTES

All information related to a DataFrame such as its size, datatype, etc is available through its attributes.

Syntax to use a specific attribute:

Attribute	Description
index	The index (row labels) of the DataFrame
columns	The column labels of the DataFrame
axes	It returns axis 0 i.e., index and axis 1 i.e., columns of the DataFrame
dtypes	Return the data types of data in the DataFrame
size	Return an int representing the number of elements in this object
shape	Return a tuple representing the dimensionality of the DataFrame i., (no.of rows, no.of columns)
values	Return a Numpy representation of the DataFrame
empty	Indicats whether DataFrame is empty
ndim	Return an int representing the number of axes/array dimensions.
T	Transpose

19 of 91

Example of a DataFrame DF:

Retrieving various properties of a DataFrame Object:

>>>df.index

Index(['First', 'Second', 'Third', 'Fourth'], dtype='object')

(for default indexes)

>>>df.index #above example

RangeIndex(start=0, stop=3, step=1)

>>> df.columns

Index(['RNo', 'SName', 'Marks'], dtype='object')

>>>df.axes

[Index(['First', 'Second', 'Third', 'Fourth'], dtype='object'), Index(['RNo', 'SName', 'Marks'], dtype='object')]

>>>df.dtypes

RNo int64

SName object

Marks int64

dtype: object

>>>df.size#4 rows X 3 columns

>>>df.shape #(no.of rows, no.of columns)

(4, 3)

>>>df.values# Numpy representation

[ [51 'Lahari' 55]

[52 'Chanakya' 62]

[53 'Harish' 52]

[54 'Neha' 75] ]

20 of 91

>>>df.empty

#if DataFrame is empty, gives True

False

>>>df.ndim # As DataFrame is a 2 Dimensional

>>>df.T

#Transpose. Rows will become columns and vice versa.

Example of a DataFrame DF:

Function	Description
len(<DF Object>)	Return the number of rows in a dataframe
(<DF Object>. count( )	If we pass any argument or 0 (default is 0), it returns count of non-NA values for each column, if it is 1, it returns count of non-NA values for each row.

OTHERS

21 of 91

>>>len(df)

>>>df.count( )

#df.count(0)or df.count(axis=’index’)

RNo 4

SName 4

Marks 4

dtype: int64

>>>df.count(1) # df.count(axis=’columns’)

First 3

Second 3

Third 3

Fourth 3

dtype: int64

>>>df.shape[0]# to get number of rows

>>>df.shape[1]# to get number of columns

22 of 91

OPERATIONS ON DATAFRAMES

SELECTING/ACCESSING DATA

MODIFYING, ADDING DATA

23 of 91

Create the following DataFrame in any method

24 of 91

import pandas as pd

dict={'Eng':[68,72,66],'Tel':[55,84,90],'Mat':[60,70,65],'Soc':[80,90,85]}

df=pd.DataFrame(dict,index=['Raj','Pavan','Mohan'])

print(DF)

25 of 91

Selecting/Accessing a subset from a DataFrame using Row/Column Names using loc function:

To access row(s) and/or a combination of rows and columns, we can use loc function.

Syntax:

<DataFrame Object>.loc[<startrow>:<endrow>, <startcolumn>:<endcolumn>]

Note: With loc, Both start label and end label are included when given as start:end

Selecting/Accessing a subset from a DataFrame using Row/Column Names using iloc function:

With this function, we can extract, subset from dataframe using the row and column numeric index/position. iloc means integer location.

Syntax:

<DF Object>.iloc[<start row index>:<end row index>, <start col index>:<end column index>]

Note: With iloc, like slices end index/position is excluded when given as start:end.

26 of 91

.at function: Access a single value for a row/column label pair by labels.

Syntax:<DF Object>.at[<row label>,<col label>]

.iat function: Access a single value for a row/column label pair by index position.

Syntax:

<DF Object>.at[<row index no><col index no>]

27 of 91

SINGLE COLUMN

SELECTING/ACCESSING a column:

Syntax:<DataFrame object> [<column name>]

(or)<DataFrame object>.<column name>

>>>df['Eng']

Raj 68

Pavan 72

Mohan 66

Name: Eng, dtype: int64

>>>df.Eng

Raj 68

Pavan 72

Mohan 66

Name: Eng, dtype: int64

MODIFYING a Column:

Note: Assigning values to a new column label that does not exist will create a new column at the end. If the column already exists in the DataFrame then the assignment statement will update the values of the already existing column, for example:

df['Eng']=[40,50,60]

df['Tel']=55

df.Mat=70,80,90

df.Soc=100

Note : If we give following,

>>> df.corporate=11,12,13 or

>>> df.corporate=[11,12,13],

No error will be displayed, but nothing will be stored in DataFrame.

ADDING a Column:

>>>df['Hin']=[89,78,76]

28 of 91

SELECTING/ACCESSING a column (loc):

>>>df.loc[:,'Eng']

Raj 68

Pavan 72

Mohan 66

Name: Eng, dtype: int64

MODIFYING a Column (loc):

>>>df.loc[:,'Eng']=[10,20,30]

# df.loc[:,'Eng']=10,20,30

>>>df.loc[:,'Mat']=100

ADDING a Column (loc):

>>>df.loc[:,'IP']=[10,20,30]

>>>df.loc[:,'Hin']=50

29 of 91

SELECTING/ACCESSING a column (iloc):

>>>df.iloc[:,1]

Raj 55

Pavan 84

Mohan 90

Name: Tel, dtype: int64

>>>df.iloc[:,[1]]

Tel

Raj 55

Pavan 84

Mohan 90

30 of 91

MODIFYING a Column (iloc):

>>>df.iloc[:,1]=[40,50,60]

>>>df.iloc[:,3]=70

Note: We cannot add a Column using iloc.

If you try to add new column using iloc, “IndexError” will come.

Ex:

>>>df.iloc[:,4]=95

IndexError : iloc cannot enlarge its target object

>>> df.iloc[:,1:3]=[[1,2],[3,4],[5,6]]

31 of 91

MULTIPLE COLUMNS

SELECTING/ACCESSING multiple column:

<DataFrame object>[ [<column name>,<column name>,…..] ]

>>>df[['Tel','Soc','Mat']]

MODIFYING multiple Columns values:

>>>df[['Tel','Soc','Mat']]=10,20,30

# df[['Tel','Soc','Mat']]=[10,20,30]

>>> df[['Tel','Soc','Mat']]=[[1,2,3],[4,5,6],[7,8,9]]

32 of 91

SELECTING/ACCESSING multiple columns (loc):

>>> df.loc[:,'Eng':'Mat']

Note: All columns between start and end columns are listed.

>>> df.loc[:,'Tel':]

>>>df.loc[:,'Mat':'Eng']

Empty DataFrame

Columns: []

Index: [Raj, Pavan, Mohan]

>>>df.loc[:,['Soc','Tel','Eng']]

33 of 91

MODIFYING multiple Columns values (loc):

>>>df.loc[:,'Eng':'Mat']=50,60,70

>>>df.loc[:,['Soc','Tel','Eng']]=10,20,30

34 of 91

SELECTING/ACCESSING multiple columns (iloc):

>>> df.iloc[:,1:3] #Excluding column 3

>>> df.iloc[:,[2,0]]

>>> df.iloc[:,1:]

>>> df.iloc[:,2:0]

Empty DataFrame

Columns: []

Index: [Raj, Pavan, Mohan]

>>> df.iloc[:,[2,0,1]]

MODIFYING multiple Columns values (iloc):

>>>df.iloc[:,1:3]=[25,35]

36 of 91

HEAD & TAIL FUNCTIONS

head(n): To display the first n rows in the DataFrame. Default value of n is 5.

tail(n): To display the last n rows in the DataFrame. Default value of n is 5.

Create the following DataFrame “MyDF”.

Execute the following commands:

MyDF.head(3)

MyDF.head( )

MyDF.head(15)

MyDF.head(-3)

MyDF.tail(3)

MyDF.tail( )

MyDF.tail(777)

MyDF.tail(-3)

37 of 91

Create the following DataFrame in any method

38 of 91

SINGLE ROW

SELECTING/ACCESSING one row (loc):

Just give the row name/label.

>>>df.loc['Pavan']

# df.loc['Pavan',] or df.loc['Pavan',:]

Eng 72

Tel 84

Mat 70

Soc 90

Name: Pavan, dtype: int64

>>> df.loc['Kiran']

KeyError: 'Kiran'

MODIFYING one row (loc):

>>>df.loc["Raj"]=91,92,93,94

#df.loc[“Raj”,:] = [91,92,93,94]

>>>df.loc["Pavan"]=100

>>> df.loc['Mohan',:]=601,602,603

ValueError: could not broadcast input array from shape (3,) into shape (4,)

39 of 91

ADDING one row (loc):

>>>df.loc['Kumar']=91,92,93,94

Note: If we try to add a row with lesser values than the number of columns in the DataFrame, it results in a ValueError, with the error message: ValueError: Cannot set a row with mismatched columns.

Similarly, if we try to add a column with lesser values than the number of rows in the DataFrame, it results in a ValueError, with the error message: ValueError: Length of values does not match length of index.

40 of 91

SELECTING/ACCESSING one row (iloc):

>>>df.iloc[1] #df.iloc[1,] or df.iloc[1,:]

Eng 72

Tel 84

Mat 70

Soc 90

Name: Pavan, dtype: int64

>>> df.iloc[4]

IndexError: single positional indexer is out-of-bounds

41 of 91

MODIFYING one row (iloc):

>>>df.iloc[2]=75

>>>df.iloc[1]=81,82,83,84

# df.iloc[1]=[81,82,83,84]

# df.iloc[1,:]=[81,82,83,84]

Note: We cannot add a row using iloc.

If you try to add new column using iloc, “IndexError” will come.

Ex:

>>>df.iloc[:,3]=91,92,93,94

IndexError : iloc cannot enlarge its target object

>>> df.iloc[[2,0]]=[[100,200,300,400],[11,22,33,44]]

42 of 91

MULTIPLE ROWS

SELECTING/ACCESSING multiple rows (loc):

>>>df.loc['Raj':'Mohan']

# df.loc['Raj':'Mohan', ] or df.loc['Raj':'Mohan', :]

>>>df.loc['Pavan':'Mohan']

>>>df.loc[['Mohan','Raj']]

>>>df.loc['Pavan':'Raj']

Empty DataFrame

Columns: [Eng, Tel, Mat, Soc]

Index: [ ]

43 of 91

MODIFYING multiple rows (loc):

>>> df.loc[['Mohan','Raj']]=[[1,2,3,4],[5,6,7,8]]

SELECTING/ACCESSING multiple rows (iloc):

>>> df.iloc[0:3] # df.iloc[0:3,] or df.iloc[0:3,:]

>>>df.iloc[0:2]

>>>df.iloc[1:10]

>>>df.iloc[1:1]

Empty DataFrame

Columns: [Eng, Tel, Mat, Soc]

Index: [ ]

>>>df.iloc[[2,1]] #df.iloc[[2,1], ] or df.iloc[[2,1], : ]

44 of 91

MODIFYING multiple rows (iloc):

>>>df.iloc[0:2]=[[1,2,3,4],[5,6,7,8]]

Modifying All Rows (iloc):

>>>df[ : ]

>>>df[ : ] = 10

45 of 91

RANGE OF COLUMNS

FROM A RANGE OF ROWS

SELECTING/ACCESSING range of columns from a range of rows (loc):

<DF Object>.loc[<startrow>:<endrow>,

<startcolumn>:<endcolumn>]

>>> df.loc['Pavan':'Mohan','Tel':'Soc']

>>>df.loc['Mohan':'Raj','Eng':'Soc']

Empty DataFrame

Columns: [Eng, Tel, Mat, Soc]

Index: []

>>>df.loc['Raj':'Pavan','Mat':'Eng']

Empty DataFrame

Columns: []

Index: [Raj, Pavan]

46 of 91

MODIFYING range of columns from a range of rows (loc):

>>>df.loc['Pavan':'Mohan','Tel':'Soc']=[[1,2,3],[4,5,6]]

SELECTING/ACCESSING range of columns from a range of rows (iloc):

>>> df.iloc[1:3,0:2] #Rows 1,2 & Columns 0,1

>>>df.iloc[[1,2],[2,0,1]]

>>> df.iloc[2:2,0:2]

Empty DataFrame

Columns: [Eng, Tel]

Index: []

>>> df.iloc[1:3,2:0]

Empty DataFrame

Columns: []

Index: [Pavan, Mohan]

>>> df.iloc[[1,3],0:2]

IndexError: positional indexers are out-of-bounds

MODIFYING range of columns

from a range of rows (iloc):

>>>df.iloc[0:2,1:4]=[[21,22,23],[31,32,33]]

48 of 91

RENAMING ROWS/COLUMNS

To change the name of any row/column individually, we can use the rename( ) function.

rename( ) function by default does not make changes in the original dataframe. It creates a new dataframe with the changes and the original dataframe remains unchanged.

Syntax:

<DF>.rename(index={<names dictionary>},

columns={<names dictionary>}, inplace=False)

Renaming Row Indexes:

>>>df.rename(index={'Raj':'Mr.Rajesh','Mohan':'Mohan Garu'},inplace=True)

Renaming Column Indexes (Column Labels):

>>> df.rename(columns={'Eng':'English', 'Mat':'Maths'},inplace=True)

49 of 91

Another Example:

dict={'RNo':[51,52,53],'SName':['Suresh','Naresh','Bhavesh']}

df=pd.DataFrame(dict, index=['First','Second','Third'])

>>>df.rename(index={'Second':'Two'}, columns={'RNo':'RollNo'},inplace=True)

Note : If we do not add “inplace=True”, when we are executing the commands only, it will show the modified values. But really it won’t modifies the values. So to modify values we need to add “inplace=True”.

50 of 91

Create the following DataFrame in any method

51 of 91

SINGLE VALUE

SELECTING/ACCESSING a single value:

Either give name of row or numeric index in square brackets.

Syntax:<DF Object>.<column>

[<row name or row numeric index>]

Ex: >>df.Eng['Pavan']

MODIFYING a single value:

>>>df.Eng['Pavan']=200 will change the value to 200

>>> df.Tel[0]=500

52 of 91

SELECTING/ACCESSING a single value (loc):

>>>df.loc['Pavan','Mat']

100

MODIFYING a single value (loc):

Specify the row label and the column name, then assign the new value.

>>>df.loc['Pavan','Mat']=100

SELECTING/ACCESSING a single value (iloc):

>>>df.iloc[2,3]

MODIFYING a single value (iloc):

>>>df.iloc[2,3]=500

53 of 91

.at function: Access a single value for a row/column label pair by labels.

Syntax:<DF Object>.at[<row label>,<col label>]

>>> df.at['Raj','Mat']

>>> df.at['Raj','Mat']=150 will change the value to 150

>>> df.at['Kiran','Soc']

KeyError: 'Kiran'

>>> df.at['Raj','IP']

KeyError: 'IP'

54 of 91

.iat function: Access a single value for a row/column label pair by index position.

Syntax:

<DF Object>.at[<row index no><col index no>]

>>> df.iat[2,2]

# df.iat[2,3]=30 will change the value to 30

55 of 91

ASSIGN FUNCTION

<DF object>=<DF object>.assign(<column name>=<values for column>)

>>> df=df.assign(Mat=[10,11,12])

>>>df=df.assign(IP=[81,82,83])

>>>df=df.assign(Tel=77)

>>>df=df.assign(New=[55,56])

ValueError: Length of values (2) does not match length of index (4)

56 of 91

DELETING ROWS/COLUMNS

Two ways to delete rows and columns

– del( ) and drop( )

We can use the DataFrame.drop() method to delete rows and columns from a DataFrame. We need to specify the names of the labels to be dropped and the axis from which they need to be dropped. To delete a row, the parameter axis is assigned the value 0 and for deleting a column,the parameter axis is assigned the value 1.

(i) Delete row(s) using drop( ) function:

Syntax:<DF>.drop(index or sequence of indexes)

>>> df.drop('Pavan',axis=0,inplace=True)

#df.drop('Pavan',inplace=True)

#df=df.drop('Pavan',axis=0)

# Default axis is 0, so no need to give

57 of 91

>>> df.drop(['Raj','Pavan'],inplace=True)

Note: If the DataFrame has more than one row with the same label, the DataFrame.drop() method will delete all the matching rows from it.

(Other examples:

df.drop(range(2,15,3)) – 2,5,8,11,14

df.drop([2,4,6,8,12])

Argument to drop( ) should be either an index, or a sequence containing indexes.)

(ii) Delete a column, using drop( ) function:

>>> df.drop('Tel',axis=1,inplace=True)

>>>df.drop(['Soc','Eng'],axis=1,inplace=True)

58 of 91

(iii) Delete a column, using del( ) function:

Syntax: del <DF object>[<column name>]

>>> del df['Mat']

59 of 91

ITERATION (Pandas 2 Chapter)

Iterating Over a Data Frame

Iterating Over a DataFrame:

>>> dict={'Teachers':[20,10],'Students':[200,150],

'Ratio':[10,15]}

>>>DF=pd.DataFrame(dict,index=['Private','Govt'])

60 of 91

iterrows( ) : This method iterates over dataframe row wise where each horizontal subset is in the form of (row-index,Series) where Series contains all column values for that row-index.

Example Program: Using iterrows( ) to

extract data from dataframe row wise.

import pandas as pd

dict={'Teachers':[20,10],'Students':[200,150],

'Ratio':[10,15]}

DF=pd.DataFrame(dict,index=['Private','Govt'])

for (row,rowSeries) in DF.iterrows():

print("Row index:", row)

print("Containing: ")

print(rowSeries)

Row index: Private

Containing:

Teachers 20

Students 200

Ratio 10

Name: Private, dtype: int64

Row index: Govt

Containing:

Teachers 10

Students 150

Ratio 15

Name: Govt, dtype: int64

OUTPUT

61 of 91

Example : Using iterrows( ) to extract row-wise Series objects

import pandas as pd

dict={'Teachers':[20,10],'Students':[200,150],

'Ratio':[10,15]}

DF=pd.DataFrame(dict,index=['Private','Govt'])

for (row,rowSeries) in DF.iterrows():

print("Row index:",row)

print("Containing: ")

i=0

for val in rowSeries:

print("At",i,"position: ",val)

i=i+1

OUTPUT

Row index: Private

Containing:

At 0 position: 20

At 1 position: 200

At 2 position: 10

Row index: Govt

Containing:

At 0 position: 10

At 1 position: 150

At 2 position: 15

62 of 91

Write a program to print the DataFrame DF, one row at a time

import pandas as pd

dict={'Teachers':[20,10],'Students':[200,150],

'Ratio':[10,15]}

DF=pd.DataFrame(dict,index=['Private','Govt'])

for i,j in DF.iterrows():

print(i)

print(j)

print("____________")

OUTPUT

Private

Teachers 20

Students 200

Ratio 10

Name: Private, dtype: int64

____________

Govt

Teachers 10

Students 150

Ratio 15

Name: Govt, dtype: int64

63 of 91

Putting Individual columns from a row:

When accessing rows of a DataFrame using iterrows(), then by using rowSeries[<column>], you can print individual column value from that row ie.,after the line,for r, Row in df.iterrows( ):

You can print individual column value as :

Row[<column name>]

Write a program to print only the values from Teachers column, for each row

import pandas as pd

dict={'Teachers':[20,10],'Students':[200,150],'Ratio':[10,15]}

DF=pd.DataFrame(dict,index=['Private','Govt'])

for row,rowSeries in DF.iterrows():

print(rowSeries['Teachers'])

print("------")

OUTPUT

------

64 of 91

iteritems( ): This method iterates over dataframe column wise where each vertical subset is in the form of (col-index,Series) where Series contains all row values for that column-index.

Note: in present versions, iteritems( ) is

replaced with items( )

Example : Using iteritems( ) to extract data from

dataframe column wise.

import pandas as pd

dict={'Teachers':[20,10],'Students':[200,150],

'Ratio':[10,15]}

DF=pd.DataFrame(dict,index=['Private','Govt'])

for (col,colSeries) in DF.items(): # iteritems( )

print("Column index:",col)

print("Containing: ")

print(colSeries)

Column index: Teachers

Containing:

Private 20

Govt 10

Name: Teachers, dtype: int64

Column index: Students

Containing:

Private 200

Govt 150

Name: Students, dtype: int64

Column index: Ratio

Containing:

Private 10

Govt 15

Name: Ratio, dtype: int64

OUTPUT

65 of 91

Example : Using iteritems( ) to extract

dataframe column wise series object

import pandas as pd

dict={'Teachers':[20,10],'Students':[200,150],

'Ratio':[10,15]}

DF=pd.DataFrame(dict,index=['Private','Govt'])

for (col,colSeries) in DF.items(): #iteritems( )

print("Column index:",col)

print("Containing: ")

i=0

for val in colSeries:

print("At row ",i,":",val)

i=i+1

OUTPUT

Column index: Teachers

Containing:

At row 0 : 20

At row 1 : 10

Column index: Students

Containing:

At row 0 : 200

At row 1 : 150

Column index: Ratio

Containing:

At row 0 : 10

At row 1 : 15

66 of 91

Write a program to print the DataFrame DF, one column at a time

import pandas as pd

dict={'Teachers':[20,10],'Students':[200,150],

'Ratio':[10,15]}

DF=pd.DataFrame(dict,index=['Private','Govt'])

for i,j in DF.items(): #iteritems( )

print(i)

print(j)

print("____________")

OUTPUT

Teachers

Private 20

Govt 10

Name: Teachers, dtype: int64

____________

Students

Private 200

Govt 150

Name: Students, dtype: int64

____________

Ratio

Private 10

Govt 15

Name: Ratio, dtype: int64

67 of 91

INDEXING

Data elements in a DataFrame can be accessed using indexing.

There are two ways of indexing Dataframes :

Label based indexing and Boolean Indexing.

LABEL BASED INDEXING

Note: This topic we already covered. But we are also discussing here under this heading “Indexing”

There are several methods in Pandas to implement label based indexing.

DataFrame.loc[ ] is an important method

68 of 91

>>> df.loc['Pavan']

Note: When the row label is passed as an integer value, it is interpreted as a label of the index and not as an integer position along the index.

69 of 91

Ex:

>>>MyDF = pd.DataFrame([10,20,30,40,50])

>>>MyDF

>>>MyDF.loc[3]

When a single column label is passed, it returns the column as a Series.

>>>df.loc[:,'Mat']

Also, we can obtain the same result that is the marks of ‘Mat’ subject by using the command:

>>>df['Mat']

To read more than one row from a DataFrame, a list of row labels is used as shown below.

Note that using [[ ]] returns a DataFrame.

>>>df.loc[['Mohan', 'Raj']]

70 of 91

BOOLEAN INDEXING

Boolean Indexing – NCERT

Boolean means a binary variable that can represent

either of the two states - True (indicated by 1) or False (indicated by 0).

In Boolean indexing, we can select the subsets of data based on the actual values in the DataFrame rather than their row/column labels.

Thus, we can use conditions on column names to filter data values.

Consider the above DataFrame df, the following statement displays True or False depending on whether the data value satisfies the given condition or not.

>>> df.loc['Mohan']>75

To check scores of ‘Mat’ subject, who scored more than 75, we can write:

>>> df.loc[:,'Mat']>75

71 of 91

Boolean Indexing Example 2

(From Other Material)

import pandas as pd

import numpy as np

dic = {'std1':{'no':101,'name':'hari','city':'tenali'}, 'std2':{'no':102,'name':'vasu','city':'guntur'},

'std3':{'no':103,'name':'kishore','city':'bapatla'}}

df = pd.DataFrame(dic)

print(df) ***

std1 std2 std3

no 101 102 103

name hari vasu kishore

city tenali guntur bapatla ***

72 of 91

lis = pd.DataFrame([[12,14,16],[18,20,22],[24,26,28]])

print(lis)

***

0 1 2

0 12 14 16

1 18 20 22

2 24 26 28

***

lis>20

***

0 1 2

0 False False False

1 False False True

2 True True True

73 of 91

lis = pd.DataFrame([[12,14,16],[18,20,22],[24,26,28]])

print(lis) ***

0 1 2

0 12 14 16

1 18 20 22

2 24 26 28

***

lis[lis>20]

0 1 2

0 NaN NaN NaN

1 NaN NaN 22.0

2 24.0 26.0 28.0

***

lis.loc[1]>20

***

0 False

1 False

2 True

Name: 1, dtype: bool

74 of 91

#creating a dataframe

dic1 = {'kiran':{'mat':67,'sci':89,'soc':93},

'rajani':{'mat':95,'sci':96,'soc':99},

'rani':{'mat':99,'sci':100,'soc':91}}

df2 = pd.DataFrame(dic1)

print(df2)

***

kiran rajani rani

mat 67 95 99

sci 89 96 100

soc 93 99 91

***

#filtering on row

df2.loc['mat']>90

***

kiran False

rajani True

rani True

Name: mat, dtype: bool

***

#filtering on column

df2.loc[:,'rajani']>95

***

mat False

sci True

soc True

Name: rajani, dtype: bool

75 of 91

#creating a dataframe

dic1 = {'kiran':{'mat':67,'sci':89,'soc':93},

'rajani':{'mat':95,'sci':96,'soc':99},

'rani':{'mat':99,'sci':100,'soc':91}}

df2 = pd.DataFrame(dic1)

print(df2)

***

kiran rajani rani

mat 67 95 99

sci 89 96 100

soc 93 99 91

***

#according to sumitha arora

df2>90

***

kiran rajani rani

mat False True True

sci False True True

soc True True True

***

#according to sumitha arora

df2[df2>90]

***

kiran rajani rani

mat NaN 95 99

sci NaN 96 100

soc 93.0 99 91

76 of 91

Accessing DataFrames Element through Slicing:

We can use slicing to select a subset of rows and/or columns from a DataFrame. To retrieve a set of rows, slicing can be used with row labels.

For example:

>>>df.loc['Pavan':'Mohan']

Here, the rows with labels Pavan and Mohan are displayed.

Note that in DataFrames slicing is inclusive of the end values. We may use a slice of labels with a column name to access values of those rows in that column only.

For example, the following statement displays the rows with label Raj and Mohan, and column with label Soc:

>>> df.loc[['Raj','Mohan'],'Soc']

77 of 91

>>> df.loc['Pavan', 'Eng':'Mat']

>>> df.loc['Raj': 'Mohan', 'Tel':'Soc']

>>>df.loc['Pavan': 'Mohan',['Eng','Mat']]

78 of 91

Filtering Rows in DataFrames

In DataFrames, Boolean values like True (1) and False (0) can be associated with indices. They can also be used to filter the records using the DataFrame.loc[ ] method.

In order to select or omit particular row(s), we can use a Boolean list specifying ‘True’ for the rows to be shown and ‘False’ for the ones to be omitted in the output.

For example, in the following statement, row having index as Science is omitted:

>>> df.loc[[True, False, True]]

>>> df.loc[[True, True, False]]

79 of 91

>>> df.loc[[False, True, True]]

>>> df.loc[[True, True, True]]

>>> df.loc[[False,False,False]]

80 of 91

BINARY OPERATIONS IN A DATAFRAME

Binary operations mean operations requiring two values to perform and these values are picked element wise.

In a binary operation, the data from two dataframes are aligned on the bases of their row and column indexes and for the matching row, column index, the given operation is performed and for the nonmatching row, column index NaN value is stored in the result.

Data is aligned in two dataframes, the data is aligned on the basis of matching row and column indexes and then arithmetic is performed for non-overlapping indexes, the arithmetic operations result as a NaN for non-matching indexes.

81 of 91

Binary Operations:

addition, subtraction, multiplication, division

import pandas as pd

dict1={'A':[11,17,23],'B':[13,19,25],'C':[15,21,27]}

DF1=pd.DataFrame(dict1)

dict2={'A':[12,18,24],'B':[14,20,26],'C':[16,22,28]}

DF2=pd.DataFrame(dict2)

dict3={'A':[1,3,5],'B':[2,4,6]}

DF3=pd.DataFrame(dict3)

dict4={'A':[7,9],'B':[8,10]}

DF4=pd.DataFrame(dict4)

82 of 91

Addition : [ Using +, add( ), radd( ) ]

Note : DF1.add(DF2) is equal to DF1+DF2

DF1.radd(DF2) is equal to DF2+DF1

radd( ) means reverse addition

>>>DF1+DF2 #DF1.add(DF2)

>>>DF1+DF3 >>>DF1+DF4

>>>DF3+DF4 >>>DF3.add(DF4)

83 of 91

Subtraction: [ Using -, sub( ), rsub( ) ]

Note : DF1.sub(DF2) is equal to DF1-DF2

DF1.rsub(DF2) is equal to DF2-DF1

rsub( ) means reverse subtraction

>>>DF1-DF2 >>>DF2-DF1

>>>DF1-DF3 >>>DF3-DF1

>>>DF3-DF4 >>>DF4-DF3

#DF3.sub(DF4) #DF3.rsub(DF4)

84 of 91

Multiplication: [ Using *, mul( ), rmul( ) ]

Note : DF1.mul(DF2) is equal to DF1*DF2

DF1.rmul(DF2) is equal to DF2*DF1

rmul( ) means reverse multiplication >>DF1*DF2 >>>DF1*DF3

Division: [ Using /, div( ), rdiv( ) ]

Note : DF1.div(DF2) is equal to DF1/DF2

DF1.rdiv(DF2) is equal to DF2/DF1

rdiv( ) means reverse division.

>>>DF1/DF2

>>>DF2/DF1

>>>DF2/DF3

85 of 91

DATAFRAME ATTRIBUTES

All information related to a DataFrame such as its size, datatype, etc is available through its attributes. Syntax to use a specific attribute: <DataFrame object>.<attribute name>

Attribute	Description
index	To display/assign index (row labels) of the DataFrame.
columns	To display/assign the column labels of the DataFrame.
axes	It returns both axis 0 i.e., index and axis 1 i.e., columns of the DataFrame.
dtypes	to display data type of each column in the DataFrame.
values	to display a NumPy ndarray having all the values in the DataFrame,without the axes labels.
shape	Return a tuple representing the dimensionality of the DataFrame ., (no.of rows, no.of columns)
size	Return an int representing the number of elements in this object.
empty	To returns the value True if DataFrame is empty and False otherwise
ndim	Return an int representing the number of axes/array dimensions.
T	To Transpose the DataFrame. Means, row indices and column labels of the DataFrame replace each other’s position

86 of 91

Retrieving various properties of a DataFrame Object:

>>>df.index

Index(['Raj', 'Pavan', 'Mohan'], dtype='object')

>>> df.index=['A','B','C']

>>> df.columns

Index(['Eng', 'Tel'], dtype='object')

>>>df.columns=['M','N']

>>>df

87 of 91

>>> df.axes

[Index(['Raj', 'Pavan', 'Mohan'], dtype='object'), Index(['Eng', 'Tel'], dtype='object')]

>>> df.dtypes

>>>df.values # Numpy representation

>>> df.shape #(no.of rows, no.of columns)

(3,2)

>>> df.size

6 #3rows X columns

>>> df.empty

#if DataFrame is empty, gives True

False

>>>df.ndim

# As DataFrame is a 2 Dimensional

>>>df.T

#Transpose. Rows will become columns and vice versa.

88 of 91

Other Functions

Function	Description
len(<DF Object>)	Return the number of rows in a dataframe
(<DF Object>. count( )	If we pass any argument or 0 (default is 0), it returns count of non-NA values for each column, if it is 1, it returns count of non-NA values for each row.

>>>len(df)

>>>df.count( )#df.count(0)or df.count(axis=’index’)

RNo 4

SName 4

Marks 4

dtype: int64

>>>df.count(1) # df.count(axis=’columns’)

First 3

Second 3

Third 3

Fourth 3

dtype: int64

>>>df.shape[0]# to get number of rows

>>>df.shape[1]# to get number of columns

89 of 91

RECORD PROGRAMS (DATAFRAMES : 6 to 13)

6. Write a Program to create

(a) A Dataframe DF1 using Dictionary having values as lists and display it

	Pavan	Srinu	Sunitha
Telugu	56	90	78
English	75	91	64
Maths	82	98	96
Science	72	92	88
Social	68	85	76

(b) Add a new Row with index “Hindi” with values 79,89 and 99 in DF1

DF1

	SNo	BookName	Price
One	1	C++	550
Two	2	Python	625
Three	3	Java	525
Four	4	C	400

DF2

(d) Delete row with index “Three” from DF2.

90 of 91

#importing pandas library

import pandas as pd

#Creating a Dataframe DF1 using 2-D Dictionary having values as lists

marks={'Pavan':[56,75,82,72,68],'Srinu':

[90,91,98,92,85], 'Sunitha':[78,64,96,88,76]}

DF1=pd.DataFrame(marks,index=

['Telugu','English','Maths','Science','Social'])

print("Displaying Dataframe DF1.....")

print(DF1)

DF1.loc["Hindi"]=79,89,99

print("\nDataframe DF1 after adding Hindi

Details...\n",DF1)

	Pavan	Srinu	Sunitha
Telugu	56	90	78
English	75	91	64
Maths	82	98	96
Science	72	92	88
Social	68	85	76

91 of 91

#Creating a Dataframe DF2 using a list having list of lists

books=[[1,'C++',550],[2,'Python',625],

[3,'Java',525], [4,'C',400]]

#each inner list is a row

DF2=pd.DataFrame(books,columns=['SNo',

'BookName','Price'], index=['One','Two','Three','Four'])

print("\nDisplaying Dataframe DF2.....")

print(DF2)

#Delete row with index “Three” from DF2.

DF2.drop('Three',inplace=True)

print("\nDataframe DF2 after removing Three

Details...\n",DF2)

	SNo	BookName	Price
One	1	C++	550
Two	2	Python	625
Three	3	Java	525
Four	4	C	400

1 of 91

2 of 91

3 of 91

4 of 91

5 of 91

6 of 91

7 of 91

8 of 91

9 of 91

10 of 91

11 of 91

12 of 91

13 of 91

14 of 91

15 of 91

16 of 91

17 of 91

18 of 91

19 of 91

20 of 91

21 of 91

22 of 91

23 of 91

24 of 91

25 of 91

26 of 91

27 of 91

28 of 91

29 of 91

30 of 91

31 of 91

32 of 91

33 of 91

34 of 91

35 of 91

36 of 91

37 of 91

38 of 91

39 of 91

40 of 91

41 of 91

42 of 91

43 of 91

44 of 91

45 of 91

46 of 91

47 of 91

48 of 91

49 of 91

50 of 91

51 of 91

52 of 91

53 of 91

54 of 91

55 of 91

56 of 91

57 of 91

58 of 91

59 of 91

60 of 91

61 of 91

62 of 91

63 of 91

64 of 91

65 of 91

66 of 91

67 of 91

68 of 91

69 of 91

70 of 91

71 of 91

72 of 91

73 of 91

74 of 91

75 of 91

76 of 91

77 of 91

78 of 91

79 of 91

80 of 91