DATAFRAMES PPT
XII – IP –PYTHON
2022.23 Python Syllabus
Data Handling using Pandas and Data Visualization
(25 Marks)
Introduction to Python libraries- Pandas, Matplotlib.
Data structures in Pandas - Series and Data Frames.
Series: Creation of Series from – ndarray, dictionary, scalar value; mathematical operations;
Head and Tail functions; Selection, Indexing and Slicing.
Data Frames: Creation - from dictionary of Series, list of dictionaries, Text/CSV files; display; iteration; Operations on rows and columns: add, select, delete, rename; Head and Tail functions; Indexing using Labels, Boolean Indexing;
Importing/Exporting Data between CSV files and Data Frames.
Data Visualization: Purpose of plotting; drawing and saving following types of plots using Matplotlib – line plot, bar graph,Histogram.
Customizing plots: adding label, title, and legend in plots.
A DataFrame is a Pandas data structure, which stores data in two-dimensional way. It is an ordered collection of columns where columns may store different types of data e.g., numeric or floating point or string or Boolean type, etc.
Characteristics:
DATAFRAME
CREATING A DATAFRAME
Before creation, we need to import two modules.
import pandas (or) import pandas as pd
import numpy (or) import numpy as np
(In the place of pd or np, we can use any valid identifier)
Syntax:
<dataFrameObject>=pandas.DataFrame(
<a 2D datastructure>, [columns=<column sequence>],
[index=<index sequence>]).
We can create using:
Displaying a DataFrame is same as the way we display other variables and objects.
(i) Creating a DataFrame using a 2-D Dictionary:
A 2-D dictionary is a dictionary having items as (key:value), where value part is a data structure of any type i.e., another dictionary, an ndarray, a series object, a list, etc.
Value part of all the keys should have similar structure.
(a) Creating a dataframe from a 2D dictionary having values as lists:
>>>dict={'RNo':[51,52,53,54],'SName':
['Lahari','Chanakya','Harish','Neha'],
'Marks':[55,62,52,75]}
df=pd.DataFrame(dict)
Program to create a dataframe using 2-D Dictionary having values as lists:
import pandas as pd
dict={'RNo':[51,52,53,54],'SName':
['Lahari','Chanakya','Harish','Neha'],
'Marks':[55,62,52,75]}
df=pd.DataFrame(dict)
print(df)
output
By default, its index will be assigned 0 (zero) onwards.
Note : As per text book, the output columns will be placed in ascending order ie “Marks” then “RNo” then “SName” but practically, the output columns are displaying as per the entered order.
Specifying Own Index:
>>>df=pd.DataFrame(dict,index=['First','Second','Third','Fourth'])
Note: If the number of indexes does not match the index sequence, then “ValueError” will occur.
Example :Given a dictionary that stores “State names” as index, “Mother Tongue” &“Population” as column names. Note: Population in crores.
Program:
import pandas as pd
dict={'Tother Tongue':['Telugu','Tamil','Hindi'],
'Population':[6,8,12]}
df=pd.DataFrame(dict,index=['AP','TN','Maharastra'])
print(df)
(b) Creating a dataframe from a 2D dictionary having values as ndarrays:
(c) Creating a dataframe from a 2D dictionary having values as dictionary object:
dict={'RNo':{'First':51,'Second':52,'Third':53,'Fourth':54},'SName':{'First':'Lahari','Second':
‘Chanakya','Third':'Harish','Fourth':'Neha'},'Marks':{'First':55,'Second':62,'Third':52,'Fourth':75}}
df=pd.DataFrame(dict)
dict={'First':{'RNo':51,'SName':'Lahari','Marks':55},
'Second':{'RNo':52,'SName':'Chanakya','Marks':62},
'Third':{'RNo':53,'SName':'Harish','Marks':52},
'Fourth':{'RNo':54,'SName':'Neha','Marks':75}}
df=pd.DataFrame(dict)
Special Condition:
Two dictionaries with dissimilar keys as inner dictionaries of a 2D dictionary. For this DataFrame can be created with non-matching inner keys.
All the inner keys become indexes, NaN values will be added for non-matching keys of inner dictionaries.
Program:
import pandas as pd
C1={'Qty':95,'Half Yearly':89}
C2={'Half Yearly':94,'Annual':97}
Marks={'Student 1':C1,'Student 2':C2}
df=pd.DataFrame(Marks)
print(df)
OUTPUT
(ii) Creating a Dataframe Object from a List of Dictionaries/Lists:
(a) Creating a Dataframe using a list having List of dictionaries :
If we pass a 2D list having dictionaries as its elements (list of dictionaries) to pandas.DataFrame() function, it will create a DataFrame object such that the inner dictionary keys will become the columns and inner dictionary’s values will make rows.
Ex:
import pandas as pd
dict1={'RNo':51,'SName':'Lahari','Marks':55}
dict2={'RNo':52,'SName':'Chanakya','Marks':62}
dict3={'RNo':53,'SName':'Harish','Marks':52}
dict4={'RNo':54,'SName':'Neha','Marks':75}
students=[dict1,dict2,dict3,dict4]
df=pd.DataFrame(students)
print(df)
Note : We can also include indexes as follows:
df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])
Note: If we do not give the same column name in every row, it will com “NaN” values.
Program:
import pandas as pd
dict1={'RNo':51,'SName':'Lahari','Marks':55}
dict2={'RNo':52,'Name':'Chanakya','Marks':62}
dict3={'RNo':53,'Name':'Harish','Marks':52}
dict4={'RNo':54,'SName':'Neha','Marks':75}
students=[dict1,dict2,dict3,dict4]
df=pd.DataFrame(students,index=['First','Second','Third','Fourth'])
print(df)
OUTPUT
(b) Creating using a list having List of lists:
lists=[[10,20,40],['A','B','C','D'],[33.5,55.75,2.5]]
df=pd.DataFrame(lists)
Inserting Rows & Column Names:
import pandas as pd
lists=[[51,'Lahari',55],[52,'Chanakya',62],[53,'Harish',52]]
#each inner list is a row
df=pd.DataFrame(lists,columns=['RNo','SName','Marks'],index=['First','Second','Third'])
print(df)
(iii) Creating a dataframe Object from a 2-D ndarray:
We can pass a two-dimensional Numpy array (ie having shape as (<n>,<n>) to DataFrame( ) to create a dataframe object.
Consider the program to create np array:
import numpy as np
import pandas as pd
narr=np.array([[10,20,30],[40,50,60]],np.int32)
print(narr)
Program:
import numpy as np
import pandas as pd
narr=np.array([[10,20,30],[40,50,60]],np.int32)
mydf=pd.DataFrame(narr)
print(mydf)
Output
[[1020 30]
[405060]]
OUTPUT
narr=np.array([[10.7,20.5],[40,50],[25.2,55]])
mydf=pd.DataFrame(narr,columns=["One","Two"],index=['A','B','C'])
print(mydf)
We can specify either columns or index or both the sequences.
Note : If, the rows of ndarrays differ in length, i.e., if number of elements in each row differ, then Python will create just single column in the dataframe object and the type of the column will be considered as object.
Example:
narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]],
dtype="object")
narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]],
dtype="object")
Output
[list([10.7, 20.5, 30.2]) list([40, 50]) list([25, 55, 11, 45])]
Program:
narr=np.array([[10.7,20.5,30.2],[40,50],[25,55,11,45]], dtype="object")
mydf=pd.DataFrame(narr) Output
(iv) Creating a dataframe Object from a 2D
Dictionary with Values as Series Objects:
import pandas as pd
RN=pd.Series([11,12,13,14])
SN=pd.Series(['Rajesh','Likhith','Navya','Bhavya'])
M=pd.Series([56,75,91,82])
studict={'RNo':RN,'SName':SN,'Marks':M}
mydf=pd.DataFrame(studict)
print(mydf)
Output
(v) Creating a dataframe Object from a 2D Dictionary with Values as Series Objects:
DF
Program:
import pandas as pd
dict={'RNo':[51,52,53,54],'SName':['Lahari','Chanakya',
'Harish','Neha'],'Marks':[55,62,52,75]}
df=pd.DataFrame(dict)
dfnew=pd.DataFrame(df)
print(dfnew)
OUTPUT
(new DataFrame created from existing DataFrame)
DATAFRAME ATTRIBUTES
All information related to a DataFrame such as its size, datatype, etc is available through its attributes.
Syntax to use a specific attribute:
<DataFrame object>.<attribute name>
Attribute | Description |
index | The index (row labels) of the DataFrame |
columns | The column labels of the DataFrame |
axes | It returns axis 0 i.e., index and axis 1 i.e., columns of the DataFrame |
dtypes | Return the data types of data in the DataFrame |
size | Return an int representing the number of elements in this object |
shape | Return a tuple representing the dimensionality of the DataFrame i., (no.of rows, no.of columns) |
values | Return a Numpy representation of the DataFrame |
empty | Indicats whether DataFrame is empty |
ndim | Return an int representing the number of axes/array dimensions. |
T | Transpose |
Example of a DataFrame DF:
Retrieving various properties of a DataFrame Object:
>>>df.index
Index(['First', 'Second', 'Third', 'Fourth'], dtype='object')
(for default indexes)
>>>df.index #above example
RangeIndex(start=0, stop=3, step=1)
>>> df.columns
Index(['RNo', 'SName', 'Marks'], dtype='object')
>>>df.axes
[Index(['First', 'Second', 'Third', 'Fourth'], dtype='object'), Index(['RNo', 'SName', 'Marks'], dtype='object')]
>>>df.dtypes
RNo int64
SName object
Marks int64
dtype: object
>>>df.size#4 rows X 3 columns
12
>>>df.shape #(no.of rows, no.of columns)
(4, 3)
>>>df.values# Numpy representation
[ [51 'Lahari' 55]
[52 'Chanakya' 62]
[53 'Harish' 52]
[54 'Neha' 75] ]
>>>df.empty
#if DataFrame is empty, gives True
False
>>>df.ndim # As DataFrame is a 2 Dimensional
2
>>>df.T
#Transpose. Rows will become columns and vice versa.
Example of a DataFrame DF:
Function | Description |
len(<DF Object>) | Return the number of rows in a dataframe |
(<DF Object>. count( ) | If we pass any argument or 0 (default is 0), it returns count of non-NA values for each column, if it is 1, it returns count of non-NA values for each row. |
OTHERS
>>>len(df)
4
>>>df.count( )
#df.count(0)or df.count(axis=’index’)
RNo 4
SName 4
Marks 4
dtype: int64
>>>df.count(1) # df.count(axis=’columns’)
First 3
Second 3
Third 3
Fourth 3
dtype: int64
>>>df.shape[0]# to get number of rows
4
>>>df.shape[1]# to get number of columns
3
SELECTING/ACCESSING DATA
Selecting/Accessing a Column:
Syntax:<DataFrame object> [<column name>]
(or)<DataFrame object>.<column name>
>>>df['Private']
AP 100
TN 98
TS 110
Name: Private, dtype: int64
>>>df.Aided
AP 75
TN 92
TS 85
Name: Aided, dtype: int64
Selecting/Accessing Multiple Columns:
<DataFrame object>[ [<column name>,
<column name>,…..] ]
>>>df[['ZP','Aided','Govt']]
Selecting/Accessing a subset from a DataFrame using Row/Column Names using loc function:
To access row(s) and/or a combination of rows and columns, we can use loc function.
Syntax: <DataFrame Object>.loc[<startrow>:<endrow>,<startcolumn>:<endcolumn>]
To access one row (loc)
Just give the row name/label.
>>>df.loc['AP',:] #df.loc['AP'] or df.loc['AP',]
Private 100
Aided 75
Govt 125
ZP 89
Name: AP, dtype: int64
>>>df.loc['Maharastra',:]
KeyError: 'Maharastra'
To access multiple rows (loc):
>>>df.loc['AP':'TS',] # df.loc['AP':'TS'] or df.loc['AP':'TS',:]
>>>df.loc[['TS','AP']] #df.loc[['TS','AP'],] ordf.loc[['TS','AP'],:]
>>>df.loc[['TN','TS','AP']]
>>>df.loc['TN':'AP']
Empty DataFrame
Columns: [Private, Aided, Govt, ZP]
Index: [ ]
To access a column using loc:
>>>df.loc[:,'Private']
AP 100
TN 98
TS 110
Name: Private, dtype: int64
To access multiple columns using loc:
>>>df.loc[:,'Private':'Govt']
Note: All columns between start and end columns are listed.
>>>df.loc[:,'Aided':]
>>>df.loc[:,'Govt':'Private']
Empty DataFrame
Columns: [ ]
Index: [AP, TN, TS]
>>>df.loc[:,['Govt','Private','ZP']]
To access range of columns from a range of rows:
<DF Object>.loc[<startrow>:<endrow>,<startcolumn>:<endcolumn>]
>>>df.loc['AP':'TN','Aided':'ZP']
>>> df.loc['TN':'AP','Aided':'ZP']
Empty DataFrame
Columns: [Aided, Govt, ZP]
Index: [ ]
>>> df.loc['AP':'TS','ZP':'Aided']
Empty DataFrame
Columns: [ ]
Index: [AP, TN, TS]
Selecting/Accessing a subset from a DataFrame using Row/Column Names using iloc function:
With this function, we can extract, subset from dataframe using the row and column numeric index/position. iloc means integer location.
Syntax:
<DF Object>.iloc[<start row index>:<end row index>, <start col index>:<end column index>]
When we use iloc, then <startindex>:<endindex> given for rows and columns work like slices, and the end index is excluded.
Note: With loc, both start label and end label are included when given as start:end, but with iloc, like slices end index/position is excluded when given as start:end.
To access one row (iloc):
>>>df.iloc[0] #df.iloc[0,] or df.iloc[0,:]
>>>df.iloc[2]
>>>df.iloc[4]
IndexError: single positional indexer is out-of-bounds
To access multiple rows (iloc):
>>>df.iloc[0:2] # df.iloc[0:2,] ordf.iloc[0:2,:]
>>>df.iloc[0:55]
>>>df.iloc[1:20]
>>>df.iloc[1:1]
Empty DataFrame
Columns: [Private, Aided, Govt, ZP]
Index: [ ]
>>>df.iloc[[1,2]] # df.iloc[[1,2], ] ordf.iloc[[1,2], : ]
>>>df.iloc[[2,0,1]]
To access a column using iloc:
>>>df.iloc[:,1]
>>>df.iloc[:,[1]]
To access multiple columns using loc:
>>>df.iloc[:,1:3] #Excluding column 3
>>>df.iloc[:,[2,0]]
>>>df.iloc[:,1:]
>>>df.iloc[:,2:0]
Empty DataFrame
Columns: []
Index: [AP, TN, TS]
>>>df.iloc[:,[2,0,1]]
To access range of columns from a range of rows (iloc):
>>>df.iloc[1:3,0:2]#Rows 1,2 & Columns 0,1
>>>df.iloc[[1,2],[0,1,2]]
>>>df.iloc[2:2,0:2]
Empty DataFrame
Columns: [Private, Aided]
Index: []
>>>df.iloc[1:3,2:0]
Empty DataFrame
Columns: []
Index: [TN, TS]
>>>df.iloc[[1,3],0:2]
IndexError: positional indexers are out-of-bounds
Selecting or Accessing Individual Value:
(i) Either give name of row or numeric index in square brackets.
Syntax:<DF Object>.<column> [<row name or row numeric index>]
Ex:>>>df.Govt['AP']
125
# df.Govt['AP']=200 will change the value to 200
>>>df.Govt[2]
110
(ii) .at function: Access a single value for a row/column label pair by labels.
Syntax:<DF Object>.at[<row label>,<col label>]
Ex: >>> df.at['AP','Aided']
75
# df.at['AP','Aided']=500 will change the value to 500
>>> df.at['Meghalaya','Aided']
KeyError: 'Meghalaya'
>>> df.at['AP','Orissa']
KeyError: 'Orissa'
(iii).iat function: Access a single value for a row/column label pair by index position.
Syntax: <DF Object>.at[<row index no><col index no>]
>>>df.iat[1,0]
98
# df.iat[1,0]=777 will change the value to 777
ADDING/MODIFYING ROWS/COLUMNS
The process of adding and modifying rows/columns value is similar.
Adding/Modifying a Row:
We can change or add rows to a DataFrame using at or loc attributes.
at:<DF Object>.at[<row name>,:]=<new value>
If there is a row with the given row label, it changes the values.
>>> df.at['AP']=123
>>>df
>>> df.at['TN',:]=[200,300,400,500]
>>>df
>>> df.at['TS',:]=[10,11,12]
ValueError: could not broadcast input array from shape (3,) into shape (4,)
If there is no row with such row label, adds new row with this row label and assigns given values to all its columns.
When you add a new row with at function, data will becomes float.
>>> df.at['MP']=[111,222,333,444]
>>>df.at[1]=[20,21,22,23]
>>>df
loc( ):
>>>df.loc['AP',:]=[300,301,302,303]
>>>df.loc['TN',:]=401,402,403,404
>>>df
>>>df.loc['MP',:]=501,502,503,504
>>>df
>>>df.loc['Orissa',:]=601,602,603
ValueError: could not broadcast input array from shape (3,) into shape (4,)
iloc( ):
>>>df.iloc[1]=[1,2,3,4]
>>>df
>>>df.iloc[2,:]=[5,6,7,8]
>>>df.iloc[[0,1]]=[[100,200,300,400],[11,22,33,44]]
>>>df
Adding/Modifying a Column:
Assigning a value to a column:
Syntax:
<DF Object>.<column name>=<new value>
(or)<DF Object>[<column>]=<new value>
Let us consider our original dataframe:
>>>df['ZP']=99
>>>df
>>>df['Corporate']=[44,55,66]
>>>df
Let us consider our original dataframe:
>>>df.Aided=11,22,33 #df.Aided=[11,12,33]
>>>df
Note :If we give following,
>>>df.corporate=11,12,13 or
>>>df.corporate=[11,12,13],
No error will be displayed, but nothing will be stored in DataFrame
Other ways to add columns:
<DF object>.at[:,<column name>]=<values for column>
<DF object>.loc[:,<column name>]=<values for column>
>>>df.loc[:,'Govt']=200
>>>df.loc[:,'ZP']=300,400,500
>>>df.loc[:,'Cor']=11,12,13
>>>df.loc[:,'Private']=[10,11,12]
iloc( ):
>>>df.iloc[:,3]=1200
>>>df.iloc[:,1:3]=[[1,2],[3,4],[5,6]]
>>>df
<DF object>=<DF object>.assign(<column name>=<values for column>)
>>>df=df.assign(Private=[10,11,12])
>>>df=df.assign(Corporate=[33,34,35])
>>>df
>>> df2=df.assign(Aided=777)
>>> df2
>>>df=df.assign(New=[55,56])
ValueError: Length of values (2) does not match length of index (3)
Modifying a Single Cell :
(ii)>>> df.at['TN','Govt']=999
>>> df
>>> df.Govt[2]=777
(ii)>>> df.at['TN','Govt']=999
>>> df
>>> df.at['MP','ZP']
KeyError: 'MP'
(iii)>>> df.iat[0,3]=2022
>>> df
(iv)>>> df.loc['TS','ZP']=555
>>> df
(v) >>> df.iloc[2,0]=333
>>> df
RENAMING ROWS/COLUMNS
To change the name of any row/column individually, we can use the rename( ) function.
rename( ) function by default does not make changes in the original dataframe. It creates a new dataframe with the changes and the original dataframe remains unchanged.
Syntax:
<DF>.rename(index={<names dictionary>},
columns={<names dictionary>}, inplace=False)
>>> df.rename(index={'TN':'Tamil Nadu',
'AP':'Andhra Pradesh'},inplace=True)
>>> df
>>>df.rename(columns={'Private':'Personal',
'ZP':'Zilla Parishad'},inplace=True)
>>> df
Another Example:
>>> df.rename(index={'Ben':'Benches','Tab':'Tables'},
columns={'A':'Sec A','B':'Sec B','C':'Sec C'})
>>> df
Note : If we do not add “inplace=True”, when we are executing the commands only, it will show the modified values. But really it won’t modifies the values. So to modify values we need to add “inplace=True”.
(inplace=True performs the drop operation in the same dataframe)
>>> df.rename(index={'Ben':'Benches','Tab':'Tables'},
columns={'A':'Sec A','B':'Sec B','C':'Sec C'},inplace=True)
>>> df
DELETING ROWS/COLUMNS
Two ways to delete rows and columns
– del( ) and drop( )
(i) Delete row(s) using drop( ) function:
Syntax:<DF>.drop(index or sequence of indexes)
>>> df.drop(["TS","AP"],inplace=True)
or
>>> df.drop(["TS","AP"],axis=0,inplace=True)
>>> df
(ii) Delete a column, using drop( ) function:
>>> df.drop(['Private','Aided','ZP'],axis=1,inplace=True)
>>> df
>>> df.drop(["Aided"],axis=1,inplace=True)
>>> df
(iii) Delete a column, using del( ) function:
(Other examples:
df.drop(range(2,15,3)) – 2,5,8,11,14
df.drop([2,4,6,8,12])
Argument to drop( ) should be either an index, or a sequence containing indexes.)
Syntax: del <DF object>[<column name>]
>>> del df['ZP']
>>> df
BOOLEAN INDEXING
Boolean Indexing, refers to having Boolean Values [(True or False) or (1 or 0) sometimes] as indexes of a dataframe.
The Boolean indexes divide the dataframe in two groups – True rows and False rows.
In some situations, we may need to divide our data in two subsets – True or False, e.g., your school has decided to launch online classes for you. But some days of the week are designated for it. So, a dataframe related to this information might look like:
| Day | No.of Classes |
True | Monday | 5 |
False | Tuesday | 0 |
True | Wednesday | 3 |
False | Thursday | 4 |
True | Friday | 7 |
True | Saturday | 2 |
Creating DataFrames with Boolean Indexing:
While creating a dataframe with Boolean indexes True and False, we should not enclosed in quotes (Otherwise, KeyError will generates)
Create a dataframe containing online classes information, through the code :
import pandas as pd
Days=['Mon','Tue','Wed','Thu','Fri','Sat']
Classes = [5,0,3,4,7,2]
dict={'Days':Days,'No.of Classes':Classes}
df=pd.DataFrame(dict,
index=[True,False,True,False,True,True])
print(df)
This is useful division in situations where we find out things like – On which days, the online classes are held? Or which ones are offline classes days? And so on.
We can also provide Boolean indexing to dataframes as 1s and 0s.
df=pd.DataFrame(dict,index=[1,0,1,0,1,1])
Accessing Rows from DataFrames with Boolean Indexing:
Boolean indexing is very useful for filtering records i.e., for finding or extracting the True or False indexed rows.
<DF>.loc[True] | Display all records with True index |
<DF>.loc[False] | Display all records with False index |
<DF>.loc[1] | Display all records with index as 1 |
<DF>.loc[0] | Display all records with index as 0 |
Ex:
Days=['Mon','Tue','Wed','Thu','Fri','Sat']
Classes = [5,0,3,4,7,2]
dict={'Days':Days,'No.of Classes':Classes}
df=pd.DataFrame(dict,
index=[True,False,True,False,True,True])
>>>df.loc[True] #df.loc[1]
>>>df.loc[False] #df.loc[0]
PYTHON PANDAS 2
PYTHON PANDAS 2 – Iterating Over a DataFrame
| Teachers | Students | Area | Ratio |
Private | 30 | 290 | Urban | 9.66 |
Govt | 18 | 185 | Rural | 10.27 |
Aided | 15 | 120 | Rural | 8 |
CBSE | 35 | 325 | Urban | 9.28 |
ICSE | 25 | 260 | Urban | 10.4 |
iterrows( ) : This method iterates over dataframe row wise where each horizontal subset is in the form of (row-index,Series) where Series contains all column values for that row-index
>>> dict={'Teachers':[30,18,15,35,25],'Students':[290,185,120,325,260],'Area':['Urban','Rural', \
'Rural','Urban','Urban'], 'Ratio':[9.66,10.27,8,9.28,10.4]}
>>> DF=pd.DataFrame(dict,index=['Private','Govt','Aided','CBSE','ICSE'])
Example : Using iterrows( ) to extract data from dataframe row wise.
import pandas as pd
dict={'Teachers':[30,18,15,35,25],\
'Students':[290,185,120,325,260],\
'Area':['Urban','Rural','Rural','Urban','Urban'],\
'Ratio':[9.66,10.27,8,9.28,10.4]}
DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\
'CBSE','ICSE'])
for (row,rowSeries) in DF.iterrows():
print("Row index:",row)
print("Containing: ")
print(rowSeries)
OUTPUT
Row index: Private
Containing:
Teachers 30
Students 290
Area Urban
Ratio 9.66
Name: Private, dtype: object
Row index: Govt
Containing:
Teachers 18
Students 185
Area Rural
Ratio 10.27
Name: Govt, dtype: object
--------
--------
Example : Using iterrows( ) to extract row-wise Series objects
import pandas as pd
dict={'Teachers':[30,18,15,35,25],\
'Students':[290,185,120,325,260],\
'Area':['Urban','Rural','Rural','Urban','Urban'],\
'Ratio':[9.66,10.27,8,9.28,10.4]}
DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\
'CBSE','ICSE'])
for (row,rowSeries) in DF.iterrows():
print("Row index:",row)
print("Containing: ")
i=0
for val in rowSeries:
print("At",i,"position: ",val)
i=i+1
OUTPUT
Row index: Private
Containing:
At 0 position: 30
At 1 position: 290
At 2 position: Urban
At 3 position: 9.66
Row index: Govt
Containing:
At 0 position: 18
At 1 position: 185
At 2 position: Rural
At 3 position: 10.27
Row index: Aided
-----
-----
iter items( ) : This method iterates over dataframe column wise where each vertical subset is in the form of (col-index,Series) where Series contains all row values for that column-index
Example : Using iteritems( ) to extract data from dataframe column wise.
import pandas as pd
dict={'Teachers':[30,18,15,35,25],\
'Students':[290,185,120,325,260],\
'Area':['Urban','Rural','Rural','Urban','Urban'],\
'Ratio':[9.66,10.27,8,9.28,10.4]}
DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\
'CBSE','ICSE'])
for (col,colSeries) in DF.iteritems():
print("Column index:",col)
print("Containing: ")
print(colSeries)
OUTPUT
Column index: Teachers
Containing:
Private 30
Govt 18
Aided 15
CBSE 35
ICSE 25
Name: Teachers, dtype: int64
Column index: Students
Containing:
Private 290
Govt 185
Aided 120
CBSE 325
ICSE 260
Name: Students, dtype: int64
--------
--------
--------
Example : Using iteritems( ) to extract dataframe column wise series object
import pandas as pd
dict={'Teachers':[30,18,15,35,25],\
'Students':[290,185,120,325,260],\
'Area':['Urban','Rural','Rural','Urban','Urban'],\
'Ratio':[9.66,10.27,8,9.28,10.4]}
DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\
'CBSE','ICSE'])
for (col,colSeries) in DF.iteritems():
print("Column index:",col)
print("Containing: ")
i=0
for val in colSeries:
print("At row ",i,":",val)
i=i+1
OUTPUT
Column index: Teachers
Containing:
At row 0 : 30
At row 1 : 18
At row 2 : 15
At row 3 : 35
At row 4 : 25
Column index: Students
Containing:
At row 0 : 290
At row 1 : 185
At row 2 : 120
At row 3 : 325
At row 4 : 260
Column index: Area
--------
--------
--------
Write a program to print the DataFrame DF, one row at a time
import pandas as pd
dict={'Teachers':[30,18,15,35,25],\
'Students':[290,185,120,325,260],\
'Area':['Urban','Rural','Rural','Urban','Urban'],\
'Ratio':[9.66,10.27,8,9.28,10.4]}
DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\
'CBSE','ICSE'])
for i,j in DF.iterrows():
print(i)
print(j)
print("____________")
OUTPUT
Private
Teachers 30
Students 290
Area Urban
Ratio 9.66
Name: Private, dtype: object
____________
Govt
Teachers 18
Students 185
Area Rural
Ratio 10.27
Name: Govt, dtype: object
____________
--------
Write a program to print the DataFrame DF, one column at a time
import pandas as pd
dict={'Teachers':[30,18,15,35,25],\
'Students':[290,185,120,325,260],\
'Area':['Urban','Rural','Rural','Urban','Urban'],\
'Ratio':[9.66,10.27,8,9.28,10.4]}
DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\
'CBSE','ICSE'])
for i,j in DF.iteritems():
print(i)
print(j)
print("____________")
OUTPUT
Teachers
Private 30
Govt 18
Aided 15
CBSE 35
ICSE 25
Name: Teachers, dtype: int64
____________
Students
Private 290
Govt 185
Aided 120
CBSE 325
ICSE 260
Name: Students, dtype: int64
_____________
--------
Putting Individual columns from a row:
When accessing rows of a DataFrame using iterrows(), then by using rowSeries[<column>], you can print individual column value from that row
Ie.,after the line,
for r, Row in df.iterrows( ):
You can print individual column value as :
Row[<column name>]
Write a program to print only the values from Teachers column, for each row
import pandas as pd
dict={'Teachers':[30,18,15,35,25],\
'Students':[290,185,120,325,260],\
'Area':['Urban','Rural','Rural','Urban','Urban'],\
'Ratio':[9.66,10.27,8,9.28,10.4]}
DF=pd.DataFrame(dict,index=['Private','Govt','Aided',\
'CBSE','ICSE'])
for row,rowSeries in DF.iterrows():
print(rowSeries['Teachers'])
print("------")
OUTPUT
30
------
18
------
15
------
35
------
25
------
BINARY OPERATIONS IN A DATAFRAME
Binary operations mean operations requiring two values to perform and these values are picked element wise. In a binary operation, the data from two dataframes are aligned on the bases of their row and column indexes and for the matching row, column index, the given operation is performed and for the nonmatching row, column index NaN value is stored in the result.
Data is aligned in two dataframes, the data is aligned on the basis of matching row and column indexes and then arithmetic is performed for non-overlapping indexes, the arithmetic operations result as a NaN for non-matching indexes.
Binary Operations: addition, subtraction, multiplication, division
Example Data Frames : DF1, DF2, DF3, DF4
DF1
DF2
DF3
DF4
DF1
DF2
DF3
DF4
Program to create the dataframes DF1, DF2, DF3, DF4
import pandas as pd
dict1={'A':[11,17,23],'B':[13,19,25],'C':[15,21,27]}
DF1=pd.DataFrame(dict1)
dict2={'A':[12,18,24],'B':[14,20,26],'C':[16,22,28]}
DF2=pd.DataFrame(dict2)
dict3={'A':[1,3,5],'B':[2,4,6]}
DF3=pd.DataFrame(dict3)
dict4={'A':[7,9],'B':[8,10]}
DF4=pd.DataFrame(dict4)
DF1
DF2
DF3
DF4
Addition : [ Using +, add( ), radd( ) ]
Note : DF1.add(DF2) is equal to DF1+DF2
DF1.radd(DF2) is equal to DF2+DF1
radd( ) means reverse addition
>>>DF1+DF2 #DF1.add(DF2)
>>>DF1+DF3
>>>DF1+DF4
>>>DF2+DF3
>>>DF2+DF4
>>>DF3+DF4
>>>DF3.add(DF4)
DF1
DF2
DF3
DF4
Subtraction: [ Using -, sub( ), rsub( ) ]
Note : DF1.sub(DF2) is equal to DF1-DF2
DF1.rsub(DF2) is equal to DF2-DF1
rsub( ) means reverse subtraction
>>>DF1-DF2
>>>DF3-DF1
>>>DF1-DF4
>>>DF4-DF1
>>>DF2-DF3
>>>DF3-DF2
>>>DF2-DF4
>>>DF4-DF2
>>>DF3-DF4
#DF3.sub(DF4)
>>>DF4-DF3
#DF3.rsub(DF4)
DF1
DF2
DF3
DF4
Multiplication: [ Using *, mul( ), rmul( ) ]
Note : DF1.mul(DF2) is equal to DF1*DF2
DF1.rmul(DF2) is equal to DF2*DF1
rmul( ) means reverse multiplication
>>>DF1*DF2
>>>DF1*DF3
>>>DF1*DF4
>>>DF2*DF3
>>>DF2*DF4
>>>DF3*DF4
DF1
DF2
DF3
DF4
Division: [ Using /, div( ), rdiv( ) ]
Note : DF1.div(DF2) is equal to DF1/DF2
DF1.rdiv(DF2) is equal to DF2/DF1
rdiv( ) means reverse division.
>>>DF1/DF2
>>>DF2/DF1
>>>DF1/DF3
>>>DF3/DF1
>>>DF1/DF4
>>>DF2/DF3
>>>DF3/DF2
>>>DF2/DF4
>>>DF4/DF2
>>>DF3/DF4
>>>DF4/DF3