1 of 37

Basics

of Python

Hands-on Workshop

Laura Kurtzberg

Training Director, IRE & NICAR

https://laurajael.com

laurak@ire.org

2 of 37

A Few Prerequisites

🍃 Open marimo, create account, and log in

🌐 Download the data from AccessNow Shutdown Tracker Optimization Project (STOP)

Note:

Chrome or Firefox browser is recommended

https://kutt.it/python-basics

3 of 37

What is Python?

  1. A kind of snake
  2. A programming language that can help you do journalism

(we’ll focus on the second one)

https://kutt.it/python-basics

4 of 37

What is Python?

https://kutt.it/python-basics

  1. You download some software onto your computer
  2. You set up the software (this can be a pain!)
  3. Now you can write scripts -- instructions for your computer to do a task, written in a way that the software can understand

5 of 37

What are notebooks?

  • "A notebook runs Python code in the same way any .py file you write does; you can use all the same libraries, etc. in the same way a regular Python environment does."
  • "The outputs are presented inside the document instead of in a terminal, making it easier to see what’s happening with your data as you analyze."

6 of 37

What are notebooks?

With cloud-hosted notebooks, on molab or colab, we can skip the installation hassles.

But if you decide Python is your jam, you might consider learning how to write and run Python on your computer.

See also other GIJN sessions:

  • No-Code Guide to Using Python (Jupyter Notebooks)
  • Extracting data from PDFs using Python

7 of 37

Marimo and Molab

  • marimo: Open-source Python notebook
  • molab: is a cloud-hosted marimo notebook workspace
  • Later on, you can install marimo on your computer

https://kutt.it/python-basics

8 of 37

What is pandas?

Open-source library for accessing and analyzing data

import pandas as pd

9 of 37

Congratulations, you’re already a programmer!

10 of 37

Pope Leo X (after Raphael)

Fernando Botero

1964

11 of 37

From an actual post on listserv NICAR-L:

Here's an example of the formula for calculating % price differential vs. Area score:

=IFERROR(IF(K6=0,0,((COUNT([% price differential vs. Area])+1)-(RANK(R6,[% price differential vs. Area],0)))/COUNT([% price differential vs. Area])*100*(1-IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$3,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$3),PENALTIES!$D$3,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$4,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$4),PENALTIES!$D$4,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$5,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$5),PENALTIES!$D$5,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$6,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$6),PENALTIES!$D$6,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$7,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$7),PENALTIES!$D$7,IF(AND('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$8,'GTA CONDO - MAY 2018 - V2'!K6<=PENALTIES!$C$8),PENALTIES!$D$8,IF('GTA CONDO - MAY 2018 - V2'!K6>PENALTIES!$B$9,PENALTIES!$D$9,""))))))))),0)

12 of 37

Why Python?

Learning a programming language can help you pursue your analysis in a much cleaner, readable, reproducible way.

Scrape

Get Faster

Bigger Data

Auto-mate

Extract from

PDFs

13 of 37

Let's try it out!

marimo: Open my notebook link and click on "fork and run"

https://kutt.it/python-basics

14 of 37

Types of Variables

String

var1 = "This class is at GIJN in Kuala Lumpur."

var2 = "My name is " + "Laura"

https://kutt.it/python-basics

15 of 37

Types of Variables

Integer / Float

int1 = 100000

float1 = 0.5

my_sum = 1 + 2 / 6.5

https://kutt.it/python-basics

16 of 37

Structures

List

my_list = ["hello", "hi", "hola", "olá"]

another_list = [1, 5, 10, 15]

https://kutt.it/python-basics

17 of 37

Structures

Dictionary

fruit_dict = {'Fruit': 'Orange', 'Weight': 10}

shoe_dict = {'Size': 9, 'Brand': 'Saoi'}

https://kutt.it/python-basics

18 of 37

Logic

Conditional Statement

score = 1

if score > 2:

print("Win")

else:

print("Lose")

19 of 37

Logic

For Loop

fruit_list = ['apple', 'pear', 'lychee']

for fruit in fruit_list:

print(fruit)

20 of 37

Row-wise filtering

import pandas as pd

df = pd.read_csv('KeepItOn_2024.csv')

lgbt = df[df.users_targeted == 'LGBTQ groups']

21 of 37

Selecting columns of data

import pandas as pd

df = pd.read_csv('KeepItOn_2024.csv')

country_and_justification =

df[['country', 'gov_justification']]

22 of 37

Sorting data

import pandas as pd

df = pd.read_csv('KeepItOn_2024.csv')

oldest = df.sort_values('start_date', ascending=True)

23 of 37

Grouping data – counts

import pandas as pd

df = pd.read_csv('KeepItOn_2024.csv')

counts = df.value_counts(subset=['country'])

24 of 37

Adding a new column to the data

import pandas as pd

df = pd.read_csv('KeepItOn_2024.csv')

df2 = df

df2['today'] = pd.to_datetime('today').normalize()

25 of 37

Adding a new column to the data

import pandas as pd

df = pd.read_csv('KeepItOn_2024.csv')

df2 = df

df2['diff'] = (df2['start_date'] -

df_test['today']).dt.days

26 of 37

AppendixBasics of Python

27 of 37

Use case: Analyze data

Popular tools: pandas

  • Automation, speed, replicability: Prep and analyze your data, quickly, the same way every time
  • Transparency: Show audience that your story's conclusions are solid

28 of 37

Use case: Analyze data

Examples:

29 of 37

Use case: Get alerted to newsworthy items

Is anything new/newsworthy here? Get pinged via email, Slack, Twitter, text message, whatever.

Example: Slack me if the nuclear power plant near Omaha had a “reportable event” today or yesterday.

30 of 37

Use case: Scrape a website

Editor: ahhhhhhh breaking news, I need a spreadsheet with the names and locations of every U.S. nuclear power plant that has a pressurized water reactor like now

You: *cracks knuckles*

Popular setup: requests, bs4

31 of 37

Use case: Scrape a website

Examples:

32 of 37

Use case: Extract data from a PDF

Our favorite tools are NaturalPDF, pdfplumber and tabula-py

Examples:

Use a command-line tool like pdftotext to extract the text, then write a custom Python script to parse the resulting text file.

33 of 37

Use case: GIS analysis

Popular tools: geopandas, fiona, shapely

Examples:

34 of 37

Use case: Make a web app

Examples:

Common Python web frameworks:

  • Django
  • Flask

35 of 37

You don’t need to know or remember everything

... just enough to complete your task. Even advanced programmers spend a good chunk of their day Googling things.

36 of 37

Dealing with errors

  • Effective Googling is a skill that we are going to practice
  • There are a ton of people who want to help you succeed, and they don’t judge!
  • Never be afraid of “looking dumb” or “being basic” when you ask a question of journalists in our community

37 of 37

Links!