Swiss Asylum Lottery
Scraping the Federal Administrative Court's Database and Analysing the Verdicts
17. Feb., Python Summit 2017, bsk
How the story began
17. Feb., Python Summit 2017, bsk
Where’s the data?
17. Feb., Python Summit 2017, bsk
Textfiles
17. Feb., Python Summit 2017, bsk
The Court
17. Feb., Python Summit 2017, bsk
Who are these judges?
17. Feb., Python Summit 2017, bsk
The Plan
17. Feb., Python Summit 2017, bsk
A Scrape judges
B Scrape appeal DB
C Analyse verdicts
A The judges
Scraping the BVGer site using BeaufitulSoup
17. Feb., Python Summit 2017, bsk
B THE APPEALS
This is a little trickier, want to go a little more into depth here
17. Feb., Python Summit 2017, bsk
import requests
import selenium
from selenium import webdriver
import time
import glob
driver = webdriver.Firefox()
search_url = 'http://www.bvger.ch/publiws/?lang=de'
driver.get(search_url)
driver.find_element_by_id('form:tree:n-3:_id145').click()
driver.find_element_by_id('form:tree:n-4:_id145').click()
driver.find_element_by_id('form:_id189').click()
#Navigating to text files
17. Feb., Python Summit 2017, bsk
for file in range(0,last_element):
Text = driver.find_element_by_class_name('icePnlGrp')
counter = str(file)
file = open('txtfiles/' + counter + ".txt", "w")
file.write(Text.text)
file.close()
driver.find_element_by_id("_id8:_id25").click()
#Visiting and saving all the text files
17. Feb., Python Summit 2017, bsk
C Analyse verdicts
The is the trickiest part. I wont go through the whole code
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob
import time
import dateutil.parser
from collections import Counter
%matplotlib inline
whole_list_of_names = []
for name in glob.glob('txtfiles/*'):
name = name.split('/')[-1]
whole_list_of_names.append(name)
#Preparing list of file names
17. Feb., Python Summit 2017, bsk
def extract_aktennummer(doc):
nr = re.search(r'Abteilung [A-Z]+\n[A-Z]-[0-9]+/[0-9]+', doc)
return nr
def extracting_entscheid_italian(doc):
entscheid = re.findall(r'le Tribunal administratif fédéral
prononce\s*:1.([^.]*)', doc)
return entscheid[0][:150]
#Developing Regular Expressions
17. Feb., Python Summit 2017, bsk
def decision_harm_auto(string):
gutgeheissen = re.search(
R'gutgeheissen|gutzuheissen|admis|accolto|accolta'
, string)
if gutgeheissen != None:
string = 'Gutgeheissen'
else:
string = 'Abgewiesen'
#Categorising using Regular Expressions
17. Feb., Python Summit 2017, bsk
for judge in relevant_clean_judges:
judge = re.search(judge, doc)
if judge != None:
judge = judge.group()
short_judge_list.append(judge)
else:
continue
#Looking for the judges
17. Feb., Python Summit 2017, bsk
#First results and visuals
17. Feb., Python Summit 2017, bsk
#And after a bit of pandas wrangling, the softest judges...
17. Feb., Python Summit 2017, bsk
#...and toughest ones
17. Feb., Python Summit 2017, bsk
3%
This is the number of appeals I could not categorise automatically. So approximatly 300. (If I was a scientist I would do this by hand now, but I’m a lazy journalist…)
publishing
After talking to lawyers, experts and the court, we published our story “Das Parteibuch der Richter beeinflusst die Asylentscheide” on 10 Oct 2016. The whole research took us 3 weeks.
publishing
After talking to lawyers, experts and the court, we published our story “Das Parteibuch der Richter beeinflusst die Asylentscheide” on 10 Oct 2016. The whole research took us 3 weeks.
The court couldn’t except the results (at first)
A Completely Different Angle
Thanks!
Barnaby Skinner, Datajournaist @tagesanzeiger & @sonntagszeitung
@barjack, github.com/barjacks, www.barnabyskinner.com