1 of 60

SERIES PPT

XII – IP –PYTHON

 

2022.23 Python Syllabus

 

Data Handling using Pandas and Data Visualization

(25 Marks)

2 of 60

Introduction to Python libraries- Pandas, Matplotlib.

Data structures in Pandas - Series and Data Frames.

Series: Creation of Series from – ndarray, dictionary, scalar value; mathematical operations;

Head and Tail functions; Selection, Indexing and Slicing.

Data Frames: Creation - from dictionary of Series, list of dictionaries, Text/CSV files; display; iteration; Operations on rows and columns: add, select, delete, rename; Head and Tail functions; Indexing using Labels, Boolean Indexing;

Importing/Exporting Data between CSV files and Data Frames.

Data Visualization: Purpose of plotting; drawing and saving following types of plots using Matplotlib – line plot, bar graph,Histogram.

Customizing plots: adding label, title, and legend in plots.

3 of 60

PANDAS INTRODUCTION

Pandas or Python Pandas is Python’s library for data analysis. Pandas has derived its name from “panel data system”, which is an ecometrics term for multi dimensional, structured data sets. Pandas has become a popular choice for data analysis.

Data analysis refers to process of evaluating big data sets using analytical and statistical tools so as to discover useful information and conclusions to support business decision-making.

The main author of Pandas is Wes McKinney.

Pandas is an open source, BSD library built for Python programming language.

Pandas Offers high performance, easy to use data structures and data analysis tools.

We need to import pandas: import pandas (or)import pandas as <identifier>

Ex: import pandas as pd

If we use numpy arrays, import numpy as np

4 of 60

Why Pandas?

  • Pandas is the most popular library in the scientific python ecosystem for doing data analysis.
  • It can read or write in many different data formats (integer, float, double, etc)
  • It can calculate in all the possible ways data is organized i.e., across rows and down columns.
  • It can easily select subsets of data from bulky data sets and even combine multiple datasets together. It has functionality to find and fill missing data.
  • It allows you to apply operations to independent groups within the data.
  • It supports reshaping of data into different forms.
  • It supports advanced time-series functionality (time series forecasting is the use of a model to predict future values based on previously observed values)
  • It supports visualization by integrating matplotlib and seaborn etc. Libraries.

Pandas is the best at handling huge tabular data sets comprising different data formats.

5 of 60

Pandas Data Structure:

.

A data structure is a particular way of storing and organizing data in a computer to suit a specific purpose so that it can be accessed and worked with in appropriate ways.

We can think Pandas data structures as enhanced versions of NumPy structured arrays in which the rows and columns can be identified and accessed with labels rather than simple integer indices.

Out of many data structures of Pandas, two basic data structures – Series and DataFrame are universally popular for their dependability. (Pandas also supports Panel Data Structure, but it is not in syllabus)

6 of 60

Property

Series

DataFrame

Dimensions

1 Dimensional

2-Dimensional

Type of Data

Homegeneous, i.e., all the elements must be of same type in a Series object

Heterogeneous, i.e., a DataFrame object can have elements of different data types

 

Mutability

Value mutable, i.e., their elements value can change

Value mutable, i.e., their elements value can change

Size-immutable, i.e., size of a Series object, once created, cannot change. It we want to add/drop an element, internally a new Series object will be created

Size-mutable, i.e., size of a Dataframe object, once created, can change in place. That is, you can add/drop elements in an existing dataframe object.

7 of 60

1

2

3

4

‘A’

‘B’

‘C’

‘D’

Index Data

Series:

DataFrame:

8 of 60

A Series is a Pandas data structure that represents a one dimensional array of indexed data.

It represents 1 D array like object containing an array of data for any NumPy data type and an associated array of data labels, called its index.

A Series type object has two main components:

* An array of actual data

* An associated array of indexes or data labels.

Both components are one-dimensional arrays with the same length. The index is used to access individual values.

SERIES

Ex:

9 of 60

CREATING SERIES OBJECTS

A Series object can be created in many ways using pandas library’s Series( ). First import pandas and numpy modules with import statements.

If we import pandas as pd, we can use pd.Series( ) instead of pandas.Series( ).

 

(i) Creating of empty Series object:

Syntax:

<Series Object>=pandas.Series( ) #S Upper case

Ex: S=pd.Series( )

Creates an empty Series S with no value, default datatype is float64.

10 of 60

(ii) Creating non-empty Series objects:

Specify arguments for data and indexes.

Syntax:

<Series object>=pd.Series(data,index=idx)

Here, idx is a valid Numpy datatype and data is the data part of the Series object, it can be one of the following:

  • A Python sequence
  • An ndarray
  • A Python dictionary
  • A scalar value

Note : If we do not give index, the by default index array consists of the integers 0 through N-1 (N is the length of data).

11 of 60

(a) Specify data as Python Sequence:

Syntax:

<Series Object>=Series(<any Python sequence>)

It will return an object of Series type.

Using Lists:

 >>> S=pd.Series([78,45,87])

>>> S

0 78

1 45

2 87

dtype: int64

>>> S=pd.Series([15.9,23.7])

>>> S

0 15.9

1 23.7

dtype: float64

12 of 60

>>> S=pd.Series([2.5,7.,9.2])

>>> S

0 2.5

1 7.0

2 9.2

dtype: float64

Note: It has taken 7. as 7.0

>>>S=pd.Series(["Welcome","To","my","School"])

>>> S

0 Welcome

1 To

2 my

3 School

dtype: object

13 of 60

>>> S=pd.Series([10,20.5])

>>> S

0 10.0

1 20.5

dtype: float64

 

>>> S=pd.Series([2.7,5,"Welcome"])

>>> S

0 2.7

1 5

2 Welcome

dtype: object

 

Note: Left column displays index and right column displays values.

14 of 60

Using Tuples:

>>> S=pd.Series((15,20,25))

>>> S

0 15

1 20

2 25

dtype: int64

 

>>> S=pd.Series((15.5,17))

>>> S

0 15.5

1 17.0

dtype: float64

15 of 60

>>> S=pd.Series((10,15.5,"Welcome to World"))

>>> S

0 10

1 15.5

2 Welcome to World

dtype: object

 

String:

 

>>> S=pd.Series("Welcome to World")

>>> S

0 Welcome to World

dtype: object

16 of 60

range() function: It generates a sequence

Ex: range(7) generates a sequence [0,1,2,3,4,5,6]

 

>>> S=pd.Series(range(7))

>>> S

0 0

1 1

2 2

3 3

4 4

5 5

6 6

dtype: int64

17 of 60

Program to create a Series object using the Python sequence

[10,15.9,"Welcome Friends"].

Solution:

import pandas as pd

S1=pd.Series([10,15.9,"Welcome Friends"])

print("Series Object is : ")

print(S1)

Output

Series Object is :

0 10

1 15.9

2 Welcome Friends

dtype: object

 

18 of 60

(b) Specify data as an ndarray:

Numpy contains function arange ( ) with the following syntax:

arrange(begin, end, update value)

Ex:np.arange(20,30,3) generates [20 23 26 29]

np.arange(20,30,2.5) generates [20. 22.5 25. 27.5]

# end value excluded

19 of 60

Program to create a Series object using an ndarray which uses arrange function(numpy array) to generate sequences between 20 and 30.

 import pandas as pd

import numpy as np

nda1=np.arange(20,30,3)

nda2=np.arange(20,30,2.5)

print("Numpy array 1",nda1)

print("Numpy array 2",nda2)

S1=pd.Series(nda1)

S2=pd.Series(nda2)

print("Series 1\n",S1)

print("Series 2\n",S2) 

#We can directly give as #S1=pd.Series(np.arange(20,30,3))

Output

Numpy array 1 [20 23 26 29]

Numpy array 2 [20. 22.5 25. 27.5]

Series 1

0 20

1 23

2 26

3 29

dtype: int32

Series 2

0 20.0

1 22.5

2 25.0

3 27.5

dtype: float64

20 of 60

Numpy contains function linspace ( ) with the following syntax:

linspace(begin, end,no.of elements between these values) end value includes

Ex: np.linspace(20,30,6) generates [20. 22. 24. 26. 28. 30.]

Program to create a Series object using an ndarray which uses linspace function(numpy array) to generate sequences between 20 and 30.

import pandas as pd

import numpy as np

nda1=np.linspace(20,30,4)

nda2=np.linspace(20,30,6)

S1=pd.Series(nda1)

S2=pd.Series(nda2)

print("Series 1\n",S1)

print("Series 2\n",S2)

Output

Series 1

0 20.000000

1 23.333333

2 26.666667

3 30.000000

dtype: float64

Series 2

0 20.0

1 22.0

2 24.0

3 26.0

4 28.0

5 30.0

dtype: float64

21 of 60

Numpy contains function tile ( ) for tiling a list for number of times.

Ex: np.tile([5,10],3) generates [5,10,5,10,5,10]

Program to create a Series object using an ndarray that is created by tiling a list [5,10] for 3 times.

 

import numpy as np

import pandas as pd

S=pd.Series(np.tile([3,5],3))

print(S)

Output

22 of 60

(c) Specify data as a Python Dictionary:

Keys of the dictionary object will becomes index of the Series and values of the dictionary become the data of Series object. Indexes, which are created from keys may not be in the same order as we have typed them.

 

Program to create a Series object using a dictionary that stores section wise toppers averages in each section of class X in a school.

import pandas as pd

Stu={'A':89.5,'B':92.34,'C':91.5}

S=pd.Series(Stu)

print(S)

Output

A 89.50

B 92.34

C 91.50

dtype: float64

23 of 60

(d) Specifying data as scalar value:

The data can be in the form of a single value or a scalar value. If data is a scalar value, then the index argument to Series( ) function must be provided.

The scalar value (given as data) will be repeated to match the length of index.

The index argument has to be a sequence of numbers or labels of any type.

>>> Marks=pd.Series(92)

>>> Marks

0 92

dtype: int64

24 of 60

>>> Marks=pd.Series(95,index=[11,12,13])

>>> Marks

11 95

12 95

13 95

dtype: int64

>>>Unknown=pd.Series('I don\'t know',index=['Un1','Un2'])

>>> Unknown

Un1 I don't know

Un2 I don't know

dtype: object

 

>>> Capital=pd.Series('Delhi',index=['State 1', 'State 2','State 3'])

>>> Capital

State 1 Delhi

State 2 Delhi

State 3 Delhi

dtype: object

25 of 60

>>> prizes=pd.Series(12,index=range(1,5))

>>> prizes

1 12

2 12

3 12

4 12

dtype: int64

>>> cer=pd.Series("Welcome",index=range(1,10,3))

>>> cer

1 Welcome

4 Welcome

7 Welcome

dtype: object

26 of 60

Program to create a Series object that stores the initial budget allocated (75000/- each ) for the four quarters of the year: Q1, Q2, Q3, Q4.

 

import pandas as pd

S=pd.Series(75000,index=['Q1','Q2','Q3','Q4'])

print(S)

Output

Q1 75000

Q2 75000

Q3 75000

Q4 75000

dtype: int64

27 of 60

Creating Series Objects – Additional Functionality:

(i) Specifying/Adding NaN values in a Series Object:

When we need to create a series object of a certain size but not having complete data, we can fill missing data with a NaN (Not a Number) value. Legal empty value NaN is defined in NumPy module, we can use np.NaN to specify missing value, or use None.

>>> import numpy as np

>>> S=pd.Series([10,"Hai",np.NaN,2.3,np.NaN])

>>> S

0 10

1 Hai

2 NaN

3 2.3

4 NaN

dtype: object

28 of 60

(ii) Specifying index(es) as well as data with Series( ):

Both values and indexes are sequences. None is taken by default, if you skip these parameters.

Syntax:<Series Object> = pandas.Series(data=None,index=None)

>>> stu=["Kamal","Mahesh","Jhansi"]

>>> marks=[76,82,79]

>>> S=pd.Series(data=marks,index=stu)

>>> S

Kamal 76

Mahesh 82

Jhansi 79

dtype: int64

29 of 60

We can use loop for defining index sequence also.

>>> S1=pd.Series(range(1,20,4),index=[vowel for vowel in 'aeiou'])

>>> S1

a 1

e 5

i 9

o 13

u 17

dtype: int64

Note: If specifying indexes explicitly using an index sequence, we must provide indexes equal to the number of values in data array; providing too few or too many indices will lead to an error, the ValueError.

30 of 60

(iii) Specify Data Type along with data and index:

<Series Object> = pandas.Series(data=None, index=None, dtype=None)

None is the default value for different parameters taken in case no value is provided for a parameter.

If we do not specify datatype, the nearest datatype to store the given values will be taken. We can specify our own datatype by specifying a NumPy datatype with dtype attribute.

>>> stu=["Kamal","Mahesh","Jhansi"]

>>> marks=[76,82,79]

>>> S=pd.Series(data=marks,index=stu,dtype=np.float64)

>>> S

Kamal 76.0

Mahesh 82.0

Jhansi 79.0

dtype: float64

31 of 60

(iv) Using a Mathematical Function/Expression to Create Data Array in Series( ):

<Series Object>=pandas.Series(index=None, data=<function/expression>)

a=[5,10,15,20]

>>> S=pd.Series(data=a*2)

#Python list a replicates 2 times

>>> S

0 5

1 10

2 15

3 20

4 5

5 10

6 15

7 20

dtype: int64

32 of 60

>>> S=pd.Series(index=a,data=a*2)

ValueError: Length of values (8) does not match length of index (4)

 

>>> m=np.arange(9,13)

>>> m

array([ 9, 10, 11, 12])

>>> S2=pd.Series(index=m,data=m*2)

>>> S2

9 18

10 20

11 22

12 24

dtype: int32

33 of 60

>>> S3=pd.Series(index=m,data=m**2)

>>> S3

9 81

10 100

11 121

12 144

dtype: int32

Indices need not be unique in Pandas Series Object. This will only cause an error if/when you perform an operation that requires unique indices.

>>> val=[10.5,12,"Welcome"]

>>> S=pd.Series(data=val,index=['a','b','a'])

>>> S

a 10.5

b 12

a Welcome

dtype: object

34 of 60

Series Object Attributes: When we create a Series type object, all information related to it is available through attributes.

Syntax: <Series object>.<attribute name>

Some common attributes:

Attribute

Description

index

The index(axis labels) of the Series

values

Return Series as ndarray or ndarray-like (data) depending on the dtype

dtype

Return the dtype object of the underlying data (datatype)

shape

Return a tuple of the shape of the underlying data

nbytes

Return the number of bytes in the underlying data

ndim

Return the number of dimensions of the underlying data

size

Return the number of elements in the underlying data

itemsize

Return the size of the dtype of the item of the underlying data

hasnans

Return True if there are any NaN values; otherwise return False

empty

Return True if the Series object is empty, false otherwise.

35 of 60

Consider the following Series Object:

>>> Marks=[34,33,np.NaN,38,40]

>>> Exams=["CT1","CT2","CT3","CT4","CT5"]

>>> S=pd.Series(Marks,index=Exams)

>>> S

CT1 34.0

CT2 33.0

CT3 NaN

CT4 38.0

CT5 40.0

dtype: float64

 

(i) index :

>>> S.index

Index(['CT1', 'CT2', 'CT3', 'CT4', 'CT5'], dtype='object')

(ii) values:

>>> S.values

array([34., 33., nan, 38., 40.])

36 of 60

(iii) dtype:

>>> S.dtype

dtype('float64')

(iv) shape:

>>> S.shape

(5, )

(v) nbytes:

>>> S.nbytes #5 elements X 4 bytes for float

40

(vi) ndim:

>>> S.ndim # Series is One Dimensional

1

(vii) size:

>>> S.size # 5 elements

5

>>> S

CT1 34.0

CT2 33.0

CT3 NaN

CT4 38.0

CT5 40.0

dtype: float64

(viii) itemsize:

 AttributeError: 'Series' object has no attribute 'itemsize'

(ix) hasnans:

>>> S.hasnans

True

(x) empty:

>>> S.empty

False

37 of 60

Other example related to index:

>>> S3=pd.Series(data=np.arange(5,25,4))

>>> S3.index

RangeIndex(start=0, stop=5, step=1)

>>> a=np.arange(9,13)

>>> S4=pd.Series(index=a,data=a*2)

>>> S4.index

Int64Index([9, 10, 11, 12], dtype='int64')

 

Some functions

Function

Use

len( )

To get total number of elements (including NaN values)

count( )

To get the count of non-NaN values in a series object

type( )

To know the data type of an object

>>> len(S)

5

>>> S.count()

4

>>> type(S)

<class 'pandas.core.series.Series'>

38 of 60

Accessing a Series Object and its Elements

We can access Series indexes separately, data separately, also can access individual elements and slices.

Let us take some example Series Objects.

>>>S1=pd.Series(data=[5,6,7,8,9,10,11,12],

index=['May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])

>>> S2=pd.Series(data=[75,72,89],

index=['Raj','Kamal','Nani'])

>>> S3=pd.Series([87,99,52],index=[11,12,13])

39 of 60

(a) Accessing Individual Elements: With index value or with its position.

Syntax:<Series Object name>[<valid index>]

>>> S1['Jul']

7

Note: (1) If the Series object has duplicate indexes, then giving an index with the Series object will return all the entries with that index.

(2) If the indexes are string type, then it will work with position value also, otherwise, KeyError will come.

>>> S1[2]

7

>>> S3[11]

87

>>> S3[0]

KeyError

40 of 60

(b) Extracting Slices from Series Object:

Slicing takes place position wise and not the index wise in a series object.

All individual elements have position numbers starting form 0 onwards.

Syntax: <object>[start:end:step]

(end value is excluding)

The slice object of a Series object is also a panda Series type object.

>>> S1[1:5]

Jun 6

Jul 7

Aug 8

Sep 9

dtype: int64

41 of 60

>>> S1[10:12]

#position wise, not index wise

Series([], dtype: int64)

 

>>> S2[::-1] #slice with values reversed

Nani 89

Kamal 72

Raj 75

dtype: int64

 

>>> S3

11 87

12 99

13 52

dtype: int64

>>> S1

5 5

Jun 6

Jul 7

Aug 8

Sep 9

Oct 10

Nov 11

Dec 12

dtype: int64

 

>>> S1[0:2:2]

5 5

dtype: int64

 

 

42 of 60

>>> S1[2:6:3]

Jul 7

Oct 10

dtype: int64

 

>>> S1[1:9:2]

Jun 6

Aug 8

Oct 10

Dec 12

dtype: int64

 

>>> S1[0::2]

5 5

Jul 7

Sep 9

Nov 11

dtype: int64

>>> S1[::-2]

Dec 12

Oct 10

Aug 8

Jun 6

dtype: int64

 

>>> S1[21:2:1]

Series([], dtype: int64)

 

>>> S1[6:1:-2]

Nov 11

Sep 9

Jul 7

dtype: int64

43 of 60

Operations on Series Object

 

(a) Modifying Elements of Series Object:

Syntax: <SeriesObject>[<index>]=<new data value>

Above assignment will change the data value of the given index in the Series object.

<SeriesObject>[start:stop]=<new data value>

Above assignment will replace all the values falling in given slice.

>>> S2=pd.Series(data=[75,72,89],index=['Raj','Kamal','Nani'])

>>> S2

Raj 75

Kamal 72

Nani 89

dtype: int64

44 of 60

>>> S2["Raj"]=94

>>> S2[1]=99

>>> S2

Raj 94

Kamal 99

Nani 89

dtype: int64

>>> S1[1:6]=25

45 of 60

Renaming Indexes:

Syntax:<Object>.index=<new index array>

>>> S3=pd.Series([87,99,52],index=[11,12,13])

>>> S3

11 87

12 99

13 52

dtype: int64

>>> S3.index=['First','Second','Third']

>>> S3

First 87

Second 99

Third 52

dtype: int64

 

>>> S3.index=['One','Two']

ValueError

46 of 60

head( ) & tail( )function:

head( ) function is used to fetch first n rows from a Pandas object and tail( ) function returns last n rows from a Pandas object.

Syntax:

<pandas object>.head([n])

<pandas object>.tail([n])

Note: If you do not provide any value for n, the head( ) and tail( ) will return first 5 and last 5 rows.

>>> S1

May 5

Jun 6

Jul 7

Aug 8

Sep 9

Oct 10

Nov 11

Dec 12

dtype: int64

47 of 60

>>> S1

May 5

Jun 6

Jul 7

Aug 8

Sep 9

Oct 10

Nov 11

Dec 12

dtype: int64

>>> S1.head(3)

May 5

Jun 6

Jul 7

dtype: int64

 

>>> S1.head()

May 5

Jun 6

Jul 7

Aug 8

Sep 9

dtype: int64

 

>>> S1.head(77)

May 5

Jun 6

Jul 7

Aug 8

Sep 9

Oct 10

Nov 11

Dec 12

dtype: int64 

>>> S1.head(-2)

May 5

Jun 6

Jul 7

Aug 8

Sep 9

Oct 10

dtype: int64

>>> S1.tail(3)

Oct 10

Nov 11

Dec 12

dtype: int64

 

>>> S1.tail()

Aug 8

Sep 9

Oct 10

Nov 11

Dec 12

dtype: int64

>>> S1.tail(22)

May 5

Jun 6

Jul 7

Aug 8

Sep 9

Oct 10

Nov 11

Dec 12

dtype: int64

 

>>> S1.tail(-3)

Aug 8

Sep 9

Oct 10

Nov 11

Dec 12

dtype: int64

48 of 60

Vector operations on Series Object:

Vector operations means that if we apply a function or expression, then it is individually applied on each item of the object. As Series Objects are built upon NumPy arrays (ndarrays), they also support vectorised operations just like ndarrays.

>>> S=pd.Series([2,3,4,5])

>>> S

0 2

1 3

2 4

3 5

dtype: int64

>>> S+2

0 4

1 5

2 6

3 7

dtype: int64 

>>> S-1

0 1

1 2

2 3

3 4

dtype: int64 

>>> S*3

0 6

1 9

2 12

3 15

dtype: int64

>>> S/2

0 1.0

1 1.5

2 2.0

3 2.5

dtype: float64 

>>> S>3

0 False

1 False

2 True

3 True

dtype: bool

>>> S5=pd.Series([2,3,4,5])

>>> S6=S5**2

>>> S6

0 4

1 9

2 16

3 25

dtype: int64

49 of 60

50 of 60

Arithmetic on Series Object

 

We can do arithmetic like addition, subtraction, division, etc with two Series objects and it will calculate result on two corresponding items of the two objects given in expression.

The operation is performed only in the matching indexes, for non matching indexes, it will produce NaN (not a number).

If the data items of the two matching indexes are not compatible for the operation, it will return NaN.

51 of 60

52 of 60

>>> Ob4=pd.Series(["Welcome","to","World"])

>>> Ob5=pd.Series(["I","am","Human"])

>>> Ob1+Ob4

TypeError: unsupported operand type(s) for +: 'int' and 'str'.

Note : If the indexes are not matched, then NaN values will come.

53 of 60

Note: When we perform airthmetic operations on two Series type objects, the data is aligned on the basis of matching indexes (this is called Data alignmane in Pandas object) and then performed arithmetic; for non-overlapping indexes, the arithmetic operations result as a NaN (Not a Number).

We can store the result of object arithmetic in another object, which will also be a Series object.

>>>Ob3=Ob1+Ob2

54 of 60

Filtering Entries:

We can filter entries from a Series object using expressions that are of Boolean type (ie the exptressions that results Boolean value True/False)

When we apply a comparison operator directly on a Pandas Series object, then it works like vectorized operation and applies this check on each individual element of Series object.

Syntax: <Series Object>[[<Boolean Expression on Series Object>]

Ex: >>> S=pd.Series([5,10,20,25,30])

Series Obj Vectorized Operation

Filtered Result

55 of 60

Sorting Series Values

 

We can sort the values of a Series object on the basis of values and indexes.

Sorting on the Basis of Values:

Syntax:

<Series object>.sort_values([ascending=True/False])

The argument ascending is optional and if skipped, it takes the value True by default.

 >>> S=pd.Series([2500,1200,1700,-500,700])

>>> S.sort_values(ascending=True)

# or >>> S.sort_values( )

>>> S.sort_values(ascending=False)

# will display in descending order

56 of 60

Note : To make the sorted values permanent in the Series object, use “inplace=True”.

Ex:

>>> S.sort_values(ascending=True,inplace=True)

# or >>> S.sort_values(inplace=True)

# will sort the Series in ascending order permanently.

Sorting on the Basis of Indexes: sort_index()

Syntax: <Series object>.sort_index([ascending=True/False])

The argument ascending is optional and if skipped, it takes the value True by default. 

Ex: Obj=pd.Series([2500,-500,3500,1500],index=['C','B','D','A'])

57 of 60

Note : To make the sorted values permanent in the Series object, use “inplace=True”.

>>> Obj.sort_index(ascending=False,inplace=True)

58 of 60

Difference between NumPy Arrays and Series Objects

ndarrays

Series Objects

We can perform vectorised operations only if the shapes of two ndarrays match, otherwise it returns an error (ValueError)

In case of vectorised operations, the data of two Series objects is aligned as per matching indexes and operation is performed on them and for non-matching indexes, NaN is returned.

The indexes are always numeric starting from 0 onwards

Series objects can have any type of indexes, including numbers (not necessarily starting from 0), letters, labels, strings, etc.

59 of 60

Reindexing: To create a similar object with different order of same indexes.

<Series Object>=<Object>.reindex

(<sequence with new order of indexes>)

>>> Obj2=Obj1.reindex(['C','A','B','D'])

>>> Obj3=Obj1.reindex(['D','B','Mar','Apr'])

With this, the same data values and their indexes will be stored in the new object as per the defined order of index in the reindex( ).

60 of 60

Dropping Entries from an Axis

To remove an entry from Series object use drop( ).

Syntax: <Series Object>.drop(<index to be removed>)

>>> Obj.drop('C',inplace=True)