DATAFRAMES PPT
XII – IP –PYTHON
2024.25
Data Handling using Pandas and Data Visualization
(25 Marks)
DATAFRAMES – SYLLABUS
From dictionary of Series, list of dictionaries, Text/CSV files;
A DataFrame is a Pandas data structure, which stores data in two-dimensional way. It is an ordered collection of columns where columns may store different types of data e.g., numeric or floating point or string or Boolean type, etc.
Characteristics:
DATAFRAME - INTRODUCTION
DATAFRAMES - CREATION
CREATING A DATAFRAME
Before creation, we need to import two modules.
import pandas (or) import pandas as pd
import numpy (or) import numpy as np
(In the place of pd or np, we can use any valid identifier)
Syntax:
<dataFrameObject>=pandas.DataFrame(
<a 2D datastructure>, [columns=<column sequence>],
[index=<index sequence>]).
We can create using:
Displaying a DataFrame is same as the way we display other variables and objects.
(i) Creating a DataFrame using a 2-D Dictionary:
A 2-D dictionary is a dictionary having items as (key:value), where value part is a data structure of any type i.e., another dictionary, an ndarray, a series object, a list, etc.
Value part of all the keys should have similar structure.
(a) Creating a dataframe from a 2D dictionary having values as lists:
>>>dict={'RNo':[51,52,53,54],'SName': ['Lahari','Chanakya','Harish','Neha'], 'Marks':[55,62,52,75]}
df=pd.DataFrame(dict)
Program to create a dataframe using 2-D Dictionary having values as lists:
import pandas as pd
dict={'RNo':[51,52,53,54],'SName':
['Lahari','Chanakya','Harish','Neha'],
'Marks':[55,62,52,75]}
df=pd.DataFrame(dict)
print(df)
output
By default, its index will be assigned 0 (zero) onwards.
Note : As per text book, the output columns will be placed in ascending order ie “Marks” then “RNo” then “SName” but practically, the output columns are displaying as per the entered order.
Specifying Own Index:
>>>df=pd.DataFrame(dict,index=['First','Second','Third','Fourth'])
Note: If the number of indexes does not match the index sequence, then “ValueError” will occur.
Example :Given a dictionary that stores “State names” as index, “Mother Tongue” &“Population” as column names. Note: Population in crores.
Program:
import pandas as pd
dict={'Tother Tongue':['Telugu','Tamil','Hindi'],
'Population':[6,8,12]}
df=pd.DataFrame(dict,index=['AP','TN','Maharastra'])
print(df)
(c) Creating a dataframe from a 2D dictionary having values as dictionary object:
dict={'RNo':{'First':51,'Second':52,'Third':53,'Fourth':54},'SName':{'First':'Lahari','Second':
‘Chanakya','Third':'Harish','Fourth':'Neha'},'Marks':{'First':55,'Second':62,'Third':52,'Fourth':75}}
df=pd.DataFrame(dict)
dict={'First':{'RNo':51,'SName':'Lahari','Marks':55},
'Second':{'RNo':52,'SName':'Chanakya','Marks':62},
'Third':{'RNo':53,'SName':'Harish','Marks':52},
'Fourth':{'RNo':54,'SName':'Neha','Marks':75}}
df=pd.DataFrame(dict)
Special Condition:
Two dictionaries with dissimilar keys as inner dictionaries of a 2D dictionary. For this DataFrame can be created with non-matching inner keys.
All the inner keys become indexes, NaN values will be added for non-matching keys of inner dictionaries.
Program:
import pandas as pd
C1={'Qty':95,'Half Yearly':89}
C2={'Half Yearly':94,'Annual':97}
Marks={'Student 1':C1,'Student 2':C2}
df=pd.DataFrame(Marks)
print(df)
OUTPUT
(ii) Creating a Dataframe Object from a List of Dictionaries/Lists:
(a) Creating a Dataframe using a list having List of dictionaries :
If we pass a 2D list having dictionaries as its elements (list of dictionaries) to pandas.DataFrame() function, it will create a DataFrame object such that the inner dictionary keys will become the columns and inner dictionary’s values will make rows.
Ex:
import pandas as pd
dict1={'RNo':51,'SName':'Lahari','Marks':55}
dict2={'RNo':52,'SName':'Chanakya','Marks':62}
dict3={'RNo':53,'SName':'Harish','Marks':52}
dict4={'RNo':54,'SName':'Neha','Marks':75}
students=[dict1,dict2,dict3,dict4]
df=pd.DataFrame(students)
print(df)
Note : We can also include indexes as follows:
df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])
Note: If we do not give the same column name in every row, it will com “NaN” values.
Program:
import pandas as pd
dict1={'RNo':51,'SName':'Lahari','Marks':55}
dict2={'RNo':52,'Name':'Chanakya','Marks':62}
dict3={'RNo':53,'Name':'Harish','Marks':52}
dict4={'RNo':54,'SName':'Neha','Marks':75}
students=[dict1,dict2,dict3,dict4]
df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])
print(df)
OUTPUT
(b) Creating using a list having List of lists:
lists=[[10,20,40],['A','B','C','D'],[33.5,55.75,2.5]]
df=pd.DataFrame(lists)
Inserting Rows & Column Names:
import pandas as pd
lists=[[51,'Lahari',55],[52,'Chanakya',62],[53,'Harish',52]]
#each inner list is a row
df=pd.DataFrame(lists,columns=['RNo','SName','Marks'],index=['First','Second','Third'])
print(df)
(iii) Creating a dataframe Object from a 2-D ndarray:
We can pass a two-dimensional Numpy array (ie having shape as (<n>,<n>) to DataFrame( ) to create a dataframe object.
Consider the program to create np array:
import numpy as np
import pandas as pd
narr=np.array([[10,20,30],[40,50,60]],np.int32)
print(narr)
Program:
import numpy as np
import pandas as pd
narr=np.array([[10,20,30],[40,50,60]],np.int32)
mydf=pd.DataFrame(narr)
print(mydf)
Output
[[1020 30]
[405060]]
OUTPUT
narr=np.array([[10.7,20.5],[40,50],[25.2,55]])
mydf=pd.DataFrame(narr,columns=["One","Two"],index=['A','B','C'])
print(mydf)
We can specify either columns or index or both the sequences.
Note : If, the rows of ndarrays differ in length, i.e., if number of elements in each row differ, then Python will create just single column in the dataframe object and the type of the column will be considered as object.
Example:
narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]], dtype="object")
narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]],dtype="object")
Output
[list([10.7, 20.5, 30.2]) list([40, 50]) list([25, 55, 11, 45])]
Program:
narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]], dtype="object")
mydf=pd.DataFrame(narr) Output
(iv) Creating a dataframe Object from a 2D
Dictionary with Values as Series Objects:
import pandas as pd
RN=pd.Series([11,12,13,14])
SN=pd.Series(['Rajesh','Likhith','Navya','Bhavya'])
M=pd.Series([56,75,91,82])
studict={'RNo':RN,'SName':SN,'Marks':M}
mydf=pd.DataFrame(studict)
print(mydf)
Output
(v) Creating a dataframe Object from a 2D Dictionary with Values as Series Objects:
DF
Program:
import pandas as pd
dict={'RNo':[51,52,53,54],'SName':['Lahari','Chanakya',
'Harish','Neha'],'Marks':[55,62,52,75]}
df=pd.DataFrame(dict)
dfnew=pd.DataFrame(df)
print(dfnew)
OUTPUT
(new DataFrame created from existing DataFrame)
DATAFRAME - ATTRIBUTES
DATAFRAME ATTRIBUTES
All information related to a DataFrame such as its size, datatype, etc is available through its attributes.
Syntax to use a specific attribute:
<DataFrame object>.<attribute name>
Attribute | Description |
index | The index (row labels) of the DataFrame |
columns | The column labels of the DataFrame |
axes | It returns axis 0 i.e., index and axis 1 i.e., columns of the DataFrame |
dtypes | Return the data types of data in the DataFrame |
size | Return an int representing the number of elements in this object |
shape | Return a tuple representing the dimensionality of the DataFrame i., (no.of rows, no.of columns) |
values | Return a Numpy representation of the DataFrame |
empty | Indicats whether DataFrame is empty |
ndim | Return an int representing the number of axes/array dimensions. |
T | Transpose |
Example of a DataFrame DF:
Retrieving various properties of a DataFrame Object:
>>>df.index
Index(['First', 'Second', 'Third', 'Fourth'], dtype='object')
(for default indexes)
>>>df.index #above example
RangeIndex(start=0, stop=3, step=1)
>>> df.columns
Index(['RNo', 'SName', 'Marks'], dtype='object')
>>>df.axes
[Index(['First', 'Second', 'Third', 'Fourth'], dtype='object'), Index(['RNo', 'SName', 'Marks'], dtype='object')]
>>>df.dtypes
RNo int64
SName object
Marks int64
dtype: object
>>>df.size#4 rows X 3 columns
12
>>>df.shape #(no.of rows, no.of columns)
(4, 3)
>>>df.values# Numpy representation
[ [51 'Lahari' 55]
[52 'Chanakya' 62]
[53 'Harish' 52]
[54 'Neha' 75] ]
>>>df.empty
#if DataFrame is empty, gives True
False
>>>df.ndim # As DataFrame is a 2 Dimensional
2
>>>df.T
#Transpose. Rows will become columns and vice versa.
Example of a DataFrame DF:
Function | Description |
len(<DF Object>) | Return the number of rows in a dataframe |
(<DF Object>. count( ) | If we pass any argument or 0 (default is 0), it returns count of non-NA values for each column, if it is 1, it returns count of non-NA values for each row. |
OTHERS
>>>len(df)
4
>>>df.count( )
#df.count(0)or df.count(axis=’index’)
RNo 4
SName 4
Marks 4
dtype: int64
>>>df.count(1) # df.count(axis=’columns’)
First 3
Second 3
Third 3
Fourth 3
dtype: int64
>>>df.shape[0]# to get number of rows
4
>>>df.shape[1]# to get number of columns
3
OPERATIONS ON DATAFRAMES
SELECTING/ACCESSING DATA
&
MODIFYING, ADDING DATA
Create the following DataFrame in any method
df
import pandas as pd
dict={'Eng':[68,72,66],'Tel':[55,84,90],'Mat':[60,70,65],'Soc':[80,90,85]}
df=pd.DataFrame(dict,index=['Raj','Pavan','Mohan'])
print(DF)
Selecting/Accessing a subset from a DataFrame using Row/Column Names using loc function:
To access row(s) and/or a combination of rows and columns, we can use loc function.
Syntax:
<DataFrame Object>.loc[<startrow>:<endrow>, <startcolumn>:<endcolumn>]
Note: With loc, Both start label and end label are included when given as start:end
Selecting/Accessing a subset from a DataFrame using Row/Column Names using iloc function:
With this function, we can extract, subset from dataframe using the row and column numeric index/position. iloc means integer location.
Syntax:
<DF Object>.iloc[<start row index>:<end row index>, <start col index>:<end column index>]
Note: With iloc, like slices end index/position is excluded when given as start:end.
.at function: Access a single value for a row/column label pair by labels.
Syntax:<DF Object>.at[<row label>,<col label>]
.iat function: Access a single value for a row/column label pair by index position.
Syntax:
<DF Object>.at[<row index no><col index no>]
SINGLE COLUMN
SELECTING/ACCESSING a column:
Syntax:<DataFrame object> [<column name>]
(or)<DataFrame object>.<column name>
>>>df['Eng']
Raj 68
Pavan 72
Mohan 66
Name: Eng, dtype: int64
>>>df.Eng
Raj 68
Pavan 72
Mohan 66
Name: Eng, dtype: int64
MODIFYING a Column:
Note: Assigning values to a new column label that does not exist will create a new column at the end. If the column already exists in the DataFrame then the assignment statement will update the values of the already existing column, for example:
df['Eng']=[40,50,60]
df['Tel']=55
df.Mat=70,80,90
df.Soc=100
Note : If we give following,
>>> df.corporate=11,12,13 or
>>> df.corporate=[11,12,13],
No error will be displayed, but nothing will be stored in DataFrame.
ADDING a Column:
>>>df['Hin']=[89,78,76]
SELECTING/ACCESSING a column (loc):
>>>df.loc[:,'Eng']
Raj 68
Pavan 72
Mohan 66
Name: Eng, dtype: int64
MODIFYING a Column (loc):
>>>df.loc[:,'Eng']=[10,20,30]
# df.loc[:,'Eng']=10,20,30
>>>df.loc[:,'Mat']=100
ADDING a Column (loc):
>>>df.loc[:,'IP']=[10,20,30]
>>>df.loc[:,'Hin']=50
SELECTING/ACCESSING a column (iloc):
>>>df.iloc[:,1]
Raj 55
Pavan 84
Mohan 90
Name: Tel, dtype: int64
>>>df.iloc[:,[1]]
Tel
Raj 55
Pavan 84
Mohan 90
MODIFYING a Column (iloc):
>>>df.iloc[:,1]=[40,50,60]
>>>df.iloc[:,3]=70
Note: We cannot add a Column using iloc.
If you try to add new column using iloc, “IndexError” will come.
Ex:
>>>df.iloc[:,4]=95
IndexError : iloc cannot enlarge its target object
>>> df.iloc[:,1:3]=[[1,2],[3,4],[5,6]]
MULTIPLE COLUMNS
SELECTING/ACCESSING multiple column:
<DataFrame object>[ [<column name>,<column name>,…..] ]
>>>df[['Tel','Soc','Mat']]
MODIFYING multiple Columns values:
>>>df[['Tel','Soc','Mat']]=10,20,30
# df[['Tel','Soc','Mat']]=[10,20,30]
>>> df[['Tel','Soc','Mat']]=[[1,2,3],[4,5,6],[7,8,9]]
SELECTING/ACCESSING multiple columns (loc):
>>> df.loc[:,'Eng':'Mat']
Note: All columns between start and end columns are listed.
>>> df.loc[:,'Tel':]
>>>df.loc[:,'Mat':'Eng']
Empty DataFrame
Columns: []
Index: [Raj, Pavan, Mohan]
>>>df.loc[:,['Soc','Tel','Eng']]
MODIFYING multiple Columns values (loc):
>>>df.loc[:,'Eng':'Mat']=50,60,70
>>>df.loc[:,['Soc','Tel','Eng']]=10,20,30
SELECTING/ACCESSING multiple columns (iloc):
>>> df.iloc[:,1:3] #Excluding column 3
>>> df.iloc[:,[2,0]]
>>> df.iloc[:,1:]
>>> df.iloc[:,2:0]
Empty DataFrame
Columns: []
Index: [Raj, Pavan, Mohan]
>>> df.iloc[:,[2,0,1]]
MODIFYING multiple Columns values (iloc):
>>>df.iloc[:,1:3]=[25,35]
HEAD & TAIL FUNCTIONS
head(n): To display the first n rows in the DataFrame. Default value of n is 5.
tail(n): To display the last n rows in the DataFrame. Default value of n is 5.
Create the following DataFrame “MyDF”.
Execute the following commands:
MyDF.head(3)
MyDF.head( )
MyDF.head(15)
MyDF.head(-3)
MyDF.tail(3)
MyDF.tail( )
MyDF.tail(777)
MyDF.tail(-3)
Create the following DataFrame in any method
df
SINGLE ROW
SELECTING/ACCESSING one row (loc):
Just give the row name/label.
>>>df.loc['Pavan']
# df.loc['Pavan',] or df.loc['Pavan',:]
Eng 72
Tel 84
Mat 70
Soc 90
Name: Pavan, dtype: int64
>>> df.loc['Kiran']
KeyError: 'Kiran'
MODIFYING one row (loc):
>>>df.loc["Raj"]=91,92,93,94
#df.loc[“Raj”,:] = [91,92,93,94]
>>>df.loc["Pavan"]=100
>>> df.loc['Mohan',:]=601,602,603
ValueError: could not broadcast input array from shape (3,) into shape (4,)
ADDING one row (loc):
>>>df.loc['Kumar']=91,92,93,94
Note: If we try to add a row with lesser values than the number of columns in the DataFrame, it results in a ValueError, with the error message: ValueError: Cannot set a row with mismatched columns.
Similarly, if we try to add a column with lesser values than the number of rows in the DataFrame, it results in a ValueError, with the error message: ValueError: Length of values does not match length of index.
SELECTING/ACCESSING one row (iloc):
>>>df.iloc[1] #df.iloc[1,] or df.iloc[1,:]
Eng 72
Tel 84
Mat 70
Soc 90
Name: Pavan, dtype: int64
>>> df.iloc[4]
IndexError: single positional indexer is out-of-bounds
MODIFYING one row (iloc):
>>>df.iloc[2]=75
>>>df.iloc[1]=81,82,83,84
# df.iloc[1]=[81,82,83,84]
# df.iloc[1,:]=[81,82,83,84]
Note: We cannot add a row using iloc.
If you try to add new column using iloc, “IndexError” will come.
Ex:
>>>df.iloc[:,3]=91,92,93,94
IndexError : iloc cannot enlarge its target object
>>> df.iloc[[2,0]]=[[100,200,300,400],[11,22,33,44]]
MULTIPLE ROWS
SELECTING/ACCESSING multiple rows (loc):
>>>df.loc['Raj':'Mohan']
# df.loc['Raj':'Mohan', ] or df.loc['Raj':'Mohan', :]
>>>df.loc['Pavan':'Mohan']
>>>df.loc[['Mohan','Raj']]
>>>df.loc['Pavan':'Raj']
Empty DataFrame
Columns: [Eng, Tel, Mat, Soc]
Index: [ ]
MODIFYING multiple rows (loc):
>>> df.loc[['Mohan','Raj']]=[[1,2,3,4],[5,6,7,8]]
SELECTING/ACCESSING multiple rows (iloc):
>>> df.iloc[0:3] # df.iloc[0:3,] or df.iloc[0:3,:]
>>>df.iloc[0:2]
>>>df.iloc[1:10]
>>>df.iloc[1:1]
Empty DataFrame
Columns: [Eng, Tel, Mat, Soc]
Index: [ ]
>>>df.iloc[[2,1]] #df.iloc[[2,1], ] or df.iloc[[2,1], : ]
MODIFYING multiple rows (iloc):
>>>df.iloc[0:2]=[[1,2,3,4],[5,6,7,8]]
Modifying All Rows (iloc):
>>>df[ : ]
>>>df[ : ] = 10
RANGE OF COLUMNS
FROM A RANGE OF ROWS
SELECTING/ACCESSING range of columns from a range of rows (loc):
<DF Object>.loc[<startrow>:<endrow>,
<startcolumn>:<endcolumn>]
>>> df.loc['Pavan':'Mohan','Tel':'Soc']
>>>df.loc['Mohan':'Raj','Eng':'Soc']
Empty DataFrame
Columns: [Eng, Tel, Mat, Soc]
Index: []
>>>df.loc['Raj':'Pavan','Mat':'Eng']
Empty DataFrame
Columns: []
Index: [Raj, Pavan]
MODIFYING range of columns from a range of rows (loc):
>>>df.loc['Pavan':'Mohan','Tel':'Soc']=[[1,2,3],[4,5,6]]
SELECTING/ACCESSING range of columns from a range of rows (iloc):
>>> df.iloc[1:3,0:2] #Rows 1,2 & Columns 0,1
>>>df.iloc[[1,2],[2,0,1]]
>>> df.iloc[2:2,0:2]
Empty DataFrame
Columns: [Eng, Tel]
Index: []
>>> df.iloc[1:3,2:0]
Empty DataFrame
Columns: []
Index: [Pavan, Mohan]
>>> df.iloc[[1,3],0:2]
IndexError: positional indexers are out-of-bounds
MODIFYING range of columns
from a range of rows (iloc):
>>>df.iloc[0:2,1:4]=[[21,22,23],[31,32,33]]
RENAMING ROWS/COLUMNS
To change the name of any row/column individually, we can use the rename( ) function.
rename( ) function by default does not make changes in the original dataframe. It creates a new dataframe with the changes and the original dataframe remains unchanged.
Syntax:
<DF>.rename(index={<names dictionary>},
columns={<names dictionary>}, inplace=False)
Renaming Row Indexes:
>>>df.rename(index={'Raj':'Mr.Rajesh','Mohan':'Mohan Garu'},inplace=True)
Renaming Column Indexes (Column Labels):
>>> df.rename(columns={'Eng':'English', 'Mat':'Maths'},inplace=True)
Another Example:
dict={'RNo':[51,52,53],'SName':['Suresh','Naresh','Bhavesh']}
df=pd.DataFrame(dict, index=['First','Second','Third'])
>>>df.rename(index={'Second':'Two'}, columns={'RNo':'RollNo'},inplace=True)
Note : If we do not add “inplace=True”, when we are executing the commands only, it will show the modified values. But really it won’t modifies the values. So to modify values we need to add “inplace=True”.
Create the following DataFrame in any method
df
SINGLE VALUE
SELECTING/ACCESSING a single value:
Either give name of row or numeric index in square brackets.
Syntax:<DF Object>.<column>
[<row name or row numeric index>]
Ex: >>df.Eng['Pavan']
72
MODIFYING a single value:
>>>df.Eng['Pavan']=200 will change the value to 200
>>> df.Tel[0]=500
SELECTING/ACCESSING a single value (loc):
>>>df.loc['Pavan','Mat']
100
MODIFYING a single value (loc):
Specify the row label and the column name, then assign the new value.
>>>df.loc['Pavan','Mat']=100
SELECTING/ACCESSING a single value (iloc):
>>>df.iloc[2,3]
85
MODIFYING a single value (iloc):
>>>df.iloc[2,3]=500
.at function: Access a single value for a row/column label pair by labels.
Syntax:<DF Object>.at[<row label>,<col label>]
>>> df.at['Raj','Mat']
60
>>> df.at['Raj','Mat']=150 will change the value to 150
>>> df.at['Kiran','Soc']
KeyError: 'Kiran'
>>> df.at['Raj','IP']
KeyError: 'IP'
.iat function: Access a single value for a row/column label pair by index position.
Syntax:
<DF Object>.at[<row index no><col index no>]
>>> df.iat[2,2]
65
# df.iat[2,3]=30 will change the value to 30
ASSIGN FUNCTION
<DF object>=<DF object>.assign(<column name>=<values for column>)
>>> df=df.assign(Mat=[10,11,12])
>>>df=df.assign(IP=[81,82,83])
>>>df=df.assign(Tel=77)
>>>df=df.assign(New=[55,56])
ValueError: Length of values (2) does not match length of index (4)
DELETING ROWS/COLUMNS
Two ways to delete rows and columns
– del( ) and drop( )
We can use the DataFrame.drop() method to delete rows and columns from a DataFrame. We need to specify the names of the labels to be dropped and the axis from which they need to be dropped. To delete a row, the parameter axis is assigned the value 0 and for deleting a column,the parameter axis is assigned the value 1.
(i) Delete row(s) using drop( ) function:
Syntax:<DF>.drop(index or sequence of indexes)
>>> df.drop('Pavan',axis=0,inplace=True)
#df.drop('Pavan',inplace=True)
#df=df.drop('Pavan',axis=0)
# Default axis is 0, so no need to give
>>> df.drop(['Raj','Pavan'],inplace=True)
Note: If the DataFrame has more than one row with the same label, the DataFrame.drop() method will delete all the matching rows from it.
(Other examples:
df.drop(range(2,15,3)) – 2,5,8,11,14
df.drop([2,4,6,8,12])
Argument to drop( ) should be either an index, or a sequence containing indexes.)
(ii) Delete a column, using drop( ) function:
>>> df.drop('Tel',axis=1,inplace=True)
>>>df.drop(['Soc','Eng'],axis=1,inplace=True)
(iii) Delete a column, using del( ) function:
Syntax: del <DF object>[<column name>]
>>> del df['Mat']
ITERATION (Pandas 2 Chapter)
Iterating Over a Data Frame
Iterating Over a DataFrame:
>>> dict={'Teachers':[20,10],'Students':[200,150],
'Ratio':[10,15]}
>>>DF=pd.DataFrame(dict,index=['Private','Govt'])
iterrows( ) : This method iterates over dataframe row wise where each horizontal subset is in the form of (row-index,Series) where Series contains all column values for that row-index.
Example Program: Using iterrows( ) to
extract data from dataframe row wise.
import pandas as pd
dict={'Teachers':[20,10],'Students':[200,150],
'Ratio':[10,15]}
DF=pd.DataFrame(dict,index=['Private','Govt'])
for (row,rowSeries) in DF.iterrows():
print("Row index:", row)
print("Containing: ")
print(rowSeries)
Row index: Private
Containing:
Teachers 20
Students 200
Ratio 10
Name: Private, dtype: int64
Row index: Govt
Containing:
Teachers 10
Students 150
Ratio 15
Name: Govt, dtype: int64
OUTPUT
Example : Using iterrows( ) to extract row-wise Series objects
import pandas as pd
dict={'Teachers':[20,10],'Students':[200,150],
'Ratio':[10,15]}
DF=pd.DataFrame(dict,index=['Private','Govt'])
for (row,rowSeries) in DF.iterrows():
print("Row index:",row)
print("Containing: ")
i=0
for val in rowSeries:
print("At",i,"position: ",val)
i=i+1
OUTPUT
Row index: Private
Containing:
At 0 position: 20
At 1 position: 200
At 2 position: 10
Row index: Govt
Containing:
At 0 position: 10
At 1 position: 150
At 2 position: 15
Write a program to print the DataFrame DF, one row at a time
import pandas as pd
dict={'Teachers':[20,10],'Students':[200,150],
'Ratio':[10,15]}
DF=pd.DataFrame(dict,index=['Private','Govt'])
for i,j in DF.iterrows():
print(i)
print(j)
print("____________")
OUTPUT
Private
Teachers 20
Students 200
Ratio 10
Name: Private, dtype: int64
____________
Govt
Teachers 10
Students 150
Ratio 15
Name: Govt, dtype: int64
Putting Individual columns from a row:
When accessing rows of a DataFrame using iterrows(), then by using rowSeries[<column>], you can print individual column value from that row ie.,after the line,for r, Row in df.iterrows( ):
You can print individual column value as :
Row[<column name>]
Write a program to print only the values from Teachers column, for each row
import pandas as pd
dict={'Teachers':[20,10],'Students':[200,150],'Ratio':[10,15]}
DF=pd.DataFrame(dict,index=['Private','Govt'])
for row,rowSeries in DF.iterrows():
print(rowSeries['Teachers'])
print("------")
OUTPUT
20
------
10
------
iteritems( ): This method iterates over dataframe column wise where each vertical subset is in the form of (col-index,Series) where Series contains all row values for that column-index.
Note: in present versions, iteritems( ) is
replaced with items( )
Example : Using iteritems( ) to extract data from
dataframe column wise.
import pandas as pd
dict={'Teachers':[20,10],'Students':[200,150],
'Ratio':[10,15]}
DF=pd.DataFrame(dict,index=['Private','Govt'])
for (col,colSeries) in DF.items(): # iteritems( )
print("Column index:",col)
print("Containing: ")
print(colSeries)
Column index: Teachers
Containing:
Private 20
Govt 10
Name: Teachers, dtype: int64
Column index: Students
Containing:
Private 200
Govt 150
Name: Students, dtype: int64
Column index: Ratio
Containing:
Private 10
Govt 15
Name: Ratio, dtype: int64
OUTPUT
Example : Using iteritems( ) to extract
dataframe column wise series object
import pandas as pd
dict={'Teachers':[20,10],'Students':[200,150],
'Ratio':[10,15]}
DF=pd.DataFrame(dict,index=['Private','Govt'])
for (col,colSeries) in DF.items(): #iteritems( )
print("Column index:",col)
print("Containing: ")
i=0
for val in colSeries:
print("At row ",i,":",val)
i=i+1
OUTPUT
Column index: Teachers
Containing:
At row 0 : 20
At row 1 : 10
Column index: Students
Containing:
At row 0 : 200
At row 1 : 150
Column index: Ratio
Containing:
At row 0 : 10
At row 1 : 15
Write a program to print the DataFrame DF, one column at a time
import pandas as pd
dict={'Teachers':[20,10],'Students':[200,150],
'Ratio':[10,15]}
DF=pd.DataFrame(dict,index=['Private','Govt'])
for i,j in DF.items(): #iteritems( )
print(i)
print(j)
print("____________")
OUTPUT
Teachers
Private 20
Govt 10
Name: Teachers, dtype: int64
____________
Students
Private 200
Govt 150
Name: Students, dtype: int64
____________
Ratio
Private 10
Govt 15
Name: Ratio, dtype: int64
INDEXING
Data elements in a DataFrame can be accessed using indexing.
There are two ways of indexing Dataframes :
Label based indexing and Boolean Indexing.
LABEL BASED INDEXING
Note: This topic we already covered. But we are also discussing here under this heading “Indexing”
There are several methods in Pandas to implement label based indexing.
DataFrame.loc[ ] is an important method
>>> df.loc['Pavan']
Note: When the row label is passed as an integer value, it is interpreted as a label of the index and not as an integer position along the index.
Ex:
>>>MyDF = pd.DataFrame([10,20,30,40,50])
>>>MyDF
>>>MyDF.loc[3]
When a single column label is passed, it returns the column as a Series.
>>>df.loc[:,'Mat']
Also, we can obtain the same result that is the marks of ‘Mat’ subject by using the command:
>>>df['Mat']
To read more than one row from a DataFrame, a list of row labels is used as shown below.
Note that using [[ ]] returns a DataFrame.
>>>df.loc[['Mohan', 'Raj']]
BOOLEAN INDEXING
Boolean Indexing – NCERT
Boolean means a binary variable that can represent
either of the two states - True (indicated by 1) or False (indicated by 0).
In Boolean indexing, we can select the subsets of data based on the actual values in the DataFrame rather than their row/column labels.
Thus, we can use conditions on column names to filter data values.
Consider the above DataFrame df, the following statement displays True or False depending on whether the data value satisfies the given condition or not.
>>> df.loc['Mohan']>75
To check scores of ‘Mat’ subject, who scored more than 75, we can write:
>>> df.loc[:,'Mat']>75
Boolean Indexing Example 2
(From Other Material)
import pandas as pd
import numpy as np
dic = {'std1':{'no':101,'name':'hari','city':'tenali'}, 'std2':{'no':102,'name':'vasu','city':'guntur'},
'std3':{'no':103,'name':'kishore','city':'bapatla'}}
df = pd.DataFrame(dic)
print(df) ***
std1 std2 std3
no 101 102 103
name hari vasu kishore
city tenali guntur bapatla ***
lis = pd.DataFrame([[12,14,16],[18,20,22],[24,26,28]])
print(lis)
***
0 1 2
0 12 14 16
1 18 20 22
2 24 26 28
***
lis>20
***
0 1 2
0 False False False
1 False False True
2 True True True
lis = pd.DataFrame([[12,14,16],[18,20,22],[24,26,28]])
print(lis) ***
0 1 2
0 12 14 16
1 18 20 22
2 24 26 28
***
lis[lis>20]
0 1 2
0 NaN NaN NaN
1 NaN NaN 22.0
2 24.0 26.0 28.0
***
lis.loc[1]>20
***
0 False
1 False
2 True
Name: 1, dtype: bool
#creating a dataframe
dic1 = {'kiran':{'mat':67,'sci':89,'soc':93},
'rajani':{'mat':95,'sci':96,'soc':99},
'rani':{'mat':99,'sci':100,'soc':91}}
df2 = pd.DataFrame(dic1)
print(df2)
***
kiran rajani rani
mat 67 95 99
sci 89 96 100
soc 93 99 91
***
#filtering on row
df2.loc['mat']>90
***
kiran False
rajani True
rani True
Name: mat, dtype: bool
***
#filtering on column
df2.loc[:,'rajani']>95
***
mat False
sci True
soc True
Name: rajani, dtype: bool
#creating a dataframe
dic1 = {'kiran':{'mat':67,'sci':89,'soc':93},
'rajani':{'mat':95,'sci':96,'soc':99},
'rani':{'mat':99,'sci':100,'soc':91}}
df2 = pd.DataFrame(dic1)
print(df2)
***
kiran rajani rani
mat 67 95 99
sci 89 96 100
soc 93 99 91
***
#according to sumitha arora
df2>90
***
kiran rajani rani
mat False True True
sci False True True
soc True True True
***
#according to sumitha arora
df2[df2>90]
***
kiran rajani rani
mat NaN 95 99
sci NaN 96 100
soc 93.0 99 91
Accessing DataFrames Element through Slicing:
We can use slicing to select a subset of rows and/or columns from a DataFrame. To retrieve a set of rows, slicing can be used with row labels.
For example:
>>>df.loc['Pavan':'Mohan']
Here, the rows with labels Pavan and Mohan are displayed.
Note that in DataFrames slicing is inclusive of the end values. We may use a slice of labels with a column name to access values of those rows in that column only.
For example, the following statement displays the rows with label Raj and Mohan, and column with label Soc:
>>> df.loc[['Raj','Mohan'],'Soc']
>>> df.loc['Pavan', 'Eng':'Mat']
>>> df.loc['Raj': 'Mohan', 'Tel':'Soc']
>>>df.loc['Pavan': 'Mohan',['Eng','Mat']]
Filtering Rows in DataFrames
In DataFrames, Boolean values like True (1) and False (0) can be associated with indices. They can also be used to filter the records using the DataFrame.loc[ ] method.
In order to select or omit particular row(s), we can use a Boolean list specifying ‘True’ for the rows to be shown and ‘False’ for the ones to be omitted in the output.
For example, in the following statement, row having index as Science is omitted:
>>> df.loc[[True, False, True]]
>>> df.loc[[True, True, False]]
>>> df.loc[[False, True, True]]
>>> df.loc[[True, True, True]]
>>> df.loc[[False,False,False]]
BINARY OPERATIONS IN A DATAFRAME
Binary operations mean operations requiring two values to perform and these values are picked element wise.
In a binary operation, the data from two dataframes are aligned on the bases of their row and column indexes and for the matching row, column index, the given operation is performed and for the nonmatching row, column index NaN value is stored in the result.
Data is aligned in two dataframes, the data is aligned on the basis of matching row and column indexes and then arithmetic is performed for non-overlapping indexes, the arithmetic operations result as a NaN for non-matching indexes.
Binary Operations:
addition, subtraction, multiplication, division
import pandas as pd
dict1={'A':[11,17,23],'B':[13,19,25],'C':[15,21,27]}
DF1=pd.DataFrame(dict1)
dict2={'A':[12,18,24],'B':[14,20,26],'C':[16,22,28]}
DF2=pd.DataFrame(dict2)
dict3={'A':[1,3,5],'B':[2,4,6]}
DF3=pd.DataFrame(dict3)
dict4={'A':[7,9],'B':[8,10]}
DF4=pd.DataFrame(dict4)
Addition : [ Using +, add( ), radd( ) ]
Note : DF1.add(DF2) is equal to DF1+DF2
DF1.radd(DF2) is equal to DF2+DF1
radd( ) means reverse addition
>>>DF1+DF2 #DF1.add(DF2)
>>>DF1+DF3 >>>DF1+DF4
>>>DF3+DF4 >>>DF3.add(DF4)
Subtraction: [ Using -, sub( ), rsub( ) ]
Note : DF1.sub(DF2) is equal to DF1-DF2
DF1.rsub(DF2) is equal to DF2-DF1
rsub( ) means reverse subtraction
>>>DF1-DF2 >>>DF2-DF1
>>>DF1-DF3 >>>DF3-DF1
>>>DF3-DF4 >>>DF4-DF3
#DF3.sub(DF4) #DF3.rsub(DF4)
Multiplication: [ Using *, mul( ), rmul( ) ]
Note : DF1.mul(DF2) is equal to DF1*DF2
DF1.rmul(DF2) is equal to DF2*DF1
rmul( ) means reverse multiplication >>DF1*DF2 >>>DF1*DF3
Division: [ Using /, div( ), rdiv( ) ]
Note : DF1.div(DF2) is equal to DF1/DF2
DF1.rdiv(DF2) is equal to DF2/DF1
rdiv( ) means reverse division.
>>>DF1/DF2
>>>DF2/DF1
>>>DF2/DF3
DATAFRAME ATTRIBUTES
All information related to a DataFrame such as its size, datatype, etc is available through its attributes. Syntax to use a specific attribute: <DataFrame object>.<attribute name>
Attribute | Description |
index | To display/assign index (row labels) of the DataFrame. |
columns | To display/assign the column labels of the DataFrame. |
axes | It returns both axis 0 i.e., index and axis 1 i.e., columns of the DataFrame. |
dtypes | to display data type of each column in the DataFrame. |
values | to display a NumPy ndarray having all the values in the DataFrame,without the axes labels. |
shape | Return a tuple representing the dimensionality of the DataFrame ., (no.of rows, no.of columns) |
size | Return an int representing the number of elements in this object. |
empty | To returns the value True if DataFrame is empty and False otherwise |
ndim | Return an int representing the number of axes/array dimensions. |
T | To Transpose the DataFrame. Means, row indices and column labels of the DataFrame replace each other’s position |
Retrieving various properties of a DataFrame Object:
>>>df.index
Index(['Raj', 'Pavan', 'Mohan'], dtype='object')
>>> df.index=['A','B','C']
>>> df.columns
Index(['Eng', 'Tel'], dtype='object')
>>>df.columns=['M','N']
>>>df
>>> df.axes
[Index(['Raj', 'Pavan', 'Mohan'], dtype='object'), Index(['Eng', 'Tel'], dtype='object')]
>>> df.dtypes
>>>df.values # Numpy representation
>>> df.shape #(no.of rows, no.of columns)
(3,2)
>>> df.size
6 #3rows X columns
>>> df.empty
#if DataFrame is empty, gives True
False
>>>df.ndim
# As DataFrame is a 2 Dimensional
2
>>>df.T
#Transpose. Rows will become columns and vice versa.
Other Functions
Function | Description |
len(<DF Object>) | Return the number of rows in a dataframe |
(<DF Object>. count( ) | If we pass any argument or 0 (default is 0), it returns count of non-NA values for each column, if it is 1, it returns count of non-NA values for each row. |
>>>len(df)
4
>>>df.count( )#df.count(0)or df.count(axis=’index’)
RNo 4
SName 4
Marks 4
dtype: int64
>>>df.count(1) # df.count(axis=’columns’)
First 3
Second 3
Third 3
Fourth 3
dtype: int64
>>>df.shape[0]# to get number of rows
4
>>>df.shape[1]# to get number of columns
3
RECORD PROGRAMS (DATAFRAMES : 6 to 13)
6. Write a Program to create
(a) A Dataframe DF1 using Dictionary having values as lists and display it
| Pavan | Srinu | Sunitha |
Telugu | 56 | 90 | 78 |
English | 75 | 91 | 64 |
Maths | 82 | 98 | 96 |
Science | 72 | 92 | 88 |
Social | 68 | 85 | 76 |
(b) Add a new Row with index “Hindi” with values 79,89 and 99 in DF1
(c) A Dataframe DF2 using a list having list of lists
DF1
| SNo | BookName | Price |
One | 1 | C++ | 550 |
Two | 2 | Python | 625 |
Three | 3 | Java | 525 |
Four | 4 | C | 400 |
DF2
(d) Delete row with index “Three” from DF2.
#importing pandas library
import pandas as pd
#Creating a Dataframe DF1 using 2-D Dictionary having values as lists
marks={'Pavan':[56,75,82,72,68],'Srinu':
[90,91,98,92,85], 'Sunitha':[78,64,96,88,76]}
DF1=pd.DataFrame(marks,index=
['Telugu','English','Maths','Science','Social'])
print("Displaying Dataframe DF1.....")
print(DF1)
DF1.loc["Hindi"]=79,89,99
print("\nDataframe DF1 after adding Hindi
Details...\n",DF1)
| Pavan | Srinu | Sunitha |
Telugu | 56 | 90 | 78 |
English | 75 | 91 | 64 |
Maths | 82 | 98 | 96 |
Science | 72 | 92 | 88 |
Social | 68 | 85 | 76 |
#Creating a Dataframe DF2 using a list having list of lists
books=[[1,'C++',550],[2,'Python',625],
[3,'Java',525], [4,'C',400]]
#each inner list is a row
DF2=pd.DataFrame(books,columns=['SNo',
'BookName','Price'], index=['One','Two','Three','Four'])
print("\nDisplaying Dataframe DF2.....")
print(DF2)
#Delete row with index “Three” from DF2.
DF2.drop('Three',inplace=True)
print("\nDataframe DF2 after removing Three
Details...\n",DF2)
| SNo | BookName | Price |
One | 1 | C++ | 550 |
Two | 2 | Python | 625 |
Three | 3 | Java | 525 |
Four | 4 | C | 400 |