1 of 11

Introduction to Pandas

Winter 2025

1

Adrian Salguero

2 of 11

Announcements

  • Homework 6 due Friday at 11:59pm
  • Coding Practice 8 due tonight at 11:59pm
  • Section tomorrow will be a Final Exam Review
    • TAs will go over practice problems
  • Complete the Course Evaluation if you haven’t already done so
  • Friday’s lecture (3/14) will also be a Final Exam Review
    • Come with questions!

2

3 of 11

Topics for Today

  • What is pandas?
  • Basics of pandas
    • Installing pandas
    • DataFrames
    • Series
  • Importing and using a csv file in pandas
  • Using pandas for visualization

3

4 of 11

Dataset for Today

4

5 of 11

What is pandas?

  • Pandas is a Python library (or module) used primarily for data analysis
  • Drawbacks
    • It can be very inefficient in terms of resource-use, especially when working with incredibly large datasets
  • The primary data structure object used in pandas is called the DataFrame and Series
    • These objects will give us access to useful methods for working with data

5

6 of 11

Installing pandas

Two common ways to install pandas

  1. Installing Anaconda
    • Pandas comes with Anaconda so no additional installation required
    • Write import pandas as pd in your Python file to use

  • Using Python package installer (pip)
    • pip install pandas
    • import pandas as pd

6

7 of 11

DataFrames and Series

  • A DataFrame is a two-dimensional structure used to store and maintain data
  • A Series is a one-dimensional structure that represents a single column or row

7

column1

column2

column3

column4

column5

DataFrame

Series

8 of 11

Reading csv file into DataFrame

  • Deciding on how to store csv file data in our program
    • Which data structure to use: list, dictionary, set, tuple, etc…
    • Parse data correctly (remove whitespace and split on the commas) and store information correctly to use
  • Pandas provides us the tool to read csv files (and other files) into a DataFrame

8

9 of 11

Useful pandas commands

  • DataFrame.index = returns the index (row) labels
  • DataFrame.columns = returns all the column labels
  • DataFrame.head(<n = number>) = return the first 5 rows, otherwise if

n = number is defined returns the first number of rows

  • DataFrame.tail(n = number) = returns the last 5 rows, otherwise if

n = number is defined, returns the last number of rows

  • DataFrame.shape = returns the dimensions of the DataFrame (rows, columns)
  • DataFrame.itterows() = allows you to iterate through rows of the DataFrame in tuples → (Index, Series)
  • DataFrame.set_index(column) = returns a new DataFrame where index (rows) of panda DataFrame is set to an existing column

9

10 of 11

Slicing and Filtering a DataFrame

  • Much like Excel or Google Sheets allows you to filter a spreadsheet, you can do the same in pandas!

Check for a single condition

new_df = df[df['column'] == "value"]

Check for multiple conditions

new_df = df[(df['column1'] == "value1") & (df['column2']>'value2')]

new_df = df[(df['column1'] == "value1") | (df['column2']>'value2')]

new_df = df[~(df['column2']>'value2')]

10

and

or

not

11 of 11

Visualization through pandas

  • DataFrames have different plots you can create from them
    • DataFrame.plot.bar() = bar graphs
    • DataFrame.plot.barh() = horizontal bar graphs
    • DataFrame.plot.hist() = histograms
    • DataFrame.plot.box() = box plots
    • DataFrame.plot.scatter() = scatter plots
    • Many others…

11