Basics
of Python
Hands-on Workshop
A Few Prerequisites
🍃 Open marimo, create account, and log in
🌐 Download the data from AccessNow Shutdown Tracker Optimization Project (STOP)
Note:
Chrome or Firefox browser is recommended
https://kutt.it/python-basics
What is Python?
(we’ll focus on the second one)
https://kutt.it/python-basics
What is Python?
https://kutt.it/python-basics
What are notebooks?
What are notebooks?
With cloud-hosted notebooks, on molab or colab, we can skip the installation hassles.
But if you decide Python is your jam, you might consider learning how to write and run Python on your computer.
See also other GIJN sessions:
Marimo and Molab
https://kutt.it/python-basics
What is pandas?
Open-source library for accessing and analyzing data
import pandas as pd
From an actual post on listserv NICAR-L:
Here's an example of the formula for calculating % price differential vs. Area score:
=IFERROR(IF(K6=0,0,((COUNT([% price differential vs. Area])+1)-(RANK(R6,[% price differential vs. Area],0)))/COUNT([% price differential vs. Area])*100*(1-IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$3,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$3),PENALTIES!$D$3,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$4,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$4),PENALTIES!$D$4,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$5,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$5),PENALTIES!$D$5,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$6,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$6),PENALTIES!$D$6,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$7,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$7),PENALTIES!$D$7,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$8,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$8),PENALTIES!$D$8,IF('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$9,PENALTIES!$D$9,""))))))))),0)
Why �Python?
Learning a programming language can help you pursue your analysis in a much cleaner, readable, reproducible way.
Scrape
Get Faster
Bigger Data
Auto-mate
Extract from
PDFs
Let's try it out!
marimo: Open my notebook link and click on "fork and run"
https://kutt.it/python-basics
Types of Variables
String
�var1 = "This class is at GIJN in Kuala Lumpur."
var2 = "My name is " + "Laura"
https://kutt.it/python-basics
Types of Variables
Integer / Float
�int1 = 100000
float1 = 0.5
my_sum = 1 + 2 / 6.5
https://kutt.it/python-basics
Structures
List
�my_list = ["hello", "hi", "hola", "olá"]
another_list = [1, 5, 10, 15]
https://kutt.it/python-basics
Structures
Dictionary
�fruit_dict = {'Fruit': 'Orange', 'Weight': 10}
shoe_dict = {'Size': 9, 'Brand': 'Saoi'}
https://kutt.it/python-basics
Logic
Conditional Statement
score = 1
if score > 2:
print("Win")
else:
print("Lose")
Logic
For Loop
fruit_list = ['apple', 'pear', 'lychee']
for fruit in fruit_list:
print(fruit)
Row-wise filtering
import pandas as pd
df = pd.read_csv('KeepItOn_2024.csv')
lgbt = df[df.users_targeted == 'LGBTQ groups']
Selecting columns of data
import pandas as pd
df = pd.read_csv('KeepItOn_2024.csv')
country_and_justification =
df[['country', 'gov_justification']]
Sorting data
import pandas as pd
df = pd.read_csv('KeepItOn_2024.csv')
oldest = df.sort_values('start_date', ascending=True)
Grouping data – counts
import pandas as pd
df = pd.read_csv('KeepItOn_2024.csv')
counts = df.value_counts(subset=['country'])
Adding a new column to the data
import pandas as pd
df = pd.read_csv('KeepItOn_2024.csv')
df2 = df
df2['today'] = pd.to_datetime('today').normalize()
Adding a new column to the data
import pandas as pd
df = pd.read_csv('KeepItOn_2024.csv')
df2 = df
df2['diff'] = (df2['start_date'] -
df_test['today']).dt.days
Appendix�Basics of Python
Use case: Analyze data
Popular tools: pandas
Use case: Analyze data
Use case: Get alerted to newsworthy items
Is anything new/newsworthy here? Get pinged via email, Slack, Twitter, text message, whatever.
Example: Slack me if the nuclear power plant near Omaha had a “reportable event” today or yesterday.
Use case: Scrape a website
Editor: ahhhhhhh breaking news, I need a spreadsheet with the names and locations of every U.S. nuclear power plant that has a pressurized water reactor like now
You: *cracks knuckles*
Use case: Scrape a website
Use case: Extract data from a PDF
Our favorite tools are NaturalPDF, pdfplumber and tabula-py
Examples:
Use a command-line tool like pdftotext to extract the text, then write a custom Python script to parse the resulting text file.
Use case: Make a web app
You don’t need to know or remember everything
... just enough to complete your task. Even advanced programmers spend a good chunk of their day Googling things.
Dealing with errors