Swiss Asylum Lottery

Scraping the Federal Administrative Court's Database and Analysing the Verdicts

17. Feb., Python Summit 2017, bsk

2 min. My name, what I do, and where I’ve come from. Very grateful to be able to speak to you today. This time last year it would have been impossible, because I hadn’t written a line of code.

I have spent a lot of time writing about technology. But about 3 years ago I started moving towards more data driven reporting. And last year I went to New York to the Columbia School of Journalism, where I could attend the Computer Assisted Reporting course called Lede12 in New York. Tough, a lot of sleepless nights. Python, lbraries like pandas, even ski kit learn, where we played around witht he iris data, and the titanic data of course.

During this course I learnt how to pogramme. And basically a new world developed for me. And I realised how really powerful the combination of coding skills and story telling is. And how it can shape our world, and at least I hope, make it a better place.

How the story began

17. Feb., Python Summit 2017, bsk

3 min In a meeting with the Recherche Team. Over a cup of coffee Lawyers complaining about how different judges deal with appeals. They basically know, before they even read the verdict, how things are going to go. Basically jut a hunch. Not real data to back it up. So I sat there and thought, if all the verdicts are available, then there must be a way to analyse them systematically. Are they all there?

Where’s the data?

17. Feb., Python Summit 2017, bsk

5 min But the appeals and the verdicts are actually all available. Here they are -> judgments -> FAC database. It's just that nobody has taken the time to analyse them. Abteilung IV & V. Search for them all. Currently 30'000+ decisions.

Textfiles

17. Feb., Python Summit 2017, bsk

6 min Textfiles Number. Date. Names of judges. Country of Origin. Lawyer. And decision. And all of that in 3 langugaes.

The Court

17. Feb., Python Summit 2017, bsk

8 min We also have to understand a little bit about the whole process. The BVGer ist the Swiss Supreme Court dealing with appeals regarding the administration. With 72 judges it is actually the largest court in Switzerland. Basically, each appeal is dealt with within 30 days. There are two sections. And each case is distributed randomly by a computer. There are slight adjustments, depending on the language somebody speaks. But basically the distribution is random.

Who are these judges?

17. Feb., Python Summit 2017, bsk

10 min Who are the judges? Swiss supreme court judges are appointed by parliament. And they all have party affiliations. And there a handful of judges without party affiliations. The current judges are on the BAVGer website. And the others we got by asking the Gericht commission.

The Plan

17. Feb., Python Summit 2017, bsk

A Scrape judges

B Scrape appeal DB

C Analyse verdicts

10.10 min

A The judges

Scraping the BVGer site using BeaufitulSoup

17. Feb., Python Summit 2017, bsk

import requests

from bs4 import BeautifulSoup

import pandas as pd

Code on Github

11 min Don’t want to go into detail on this. We basically

B THE APPEALS

This is a little trickier, want to go a little more into depth here

17. Feb., Python Summit 2017, bsk

import requests

import selenium

from selenium import webdriver

import time

import glob

12 min

driver = webdriver.Firefox()

search_url = 'http://www.bvger.ch/publiws/?lang=de'

driver.get(search_url)

driver.find_element_by_id('form:tree:n-3:_id145').click()

driver.find_element_by_id('form:tree:n-4:_id145').click()

driver.find_element_by_id('form:_id189').click()

#Navigating to text files

17. Feb., Python Summit 2017, bsk

13 min

for file in range(0,last_element):

Text = driver.find_element_by_class_name('icePnlGrp')

counter = str(file)

file = open('txtfiles/' + counter + ".txt", "w")

file.write(Text.text)

file.close()

driver.find_element_by_id("_id8:_id25").click()

#Visiting and saving all the text files

17. Feb., Python Summit 2017, bsk

16 min

C Analyse verdicts

The is the trickiest part. I wont go through the whole code

import re

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import glob

import time

import dateutil.parser

from collections import Counter

%matplotlib inline

The entire code

17 min

whole_list_of_names = []

for name in glob.glob('txtfiles/*'):

name = name.split('/')[-1]

whole_list_of_names.append(name)

#Preparing list of file names

17. Feb., Python Summit 2017, bsk

17.30 min.

def extract_aktennummer(doc):

nr = re.search(r'Abteilung [A-Z]+\n[A-Z]-[0-9]+/[0-9]+', doc)

return nr

def extracting_entscheid_italian(doc):

entscheid = re.findall(r'le Tribunal administratif fédéral

prononce\s*:1.([^.]*)', doc)

return entscheid[0][:150]

#Developing Regular Expressions

17. Feb., Python Summit 2017, bsk

19.30 min. This had to be done in 3 languages of course. We did this for number, date, decision, nationality of plaintiffs, and lawyers.

def decision_harm_auto(string):

gutgeheissen = re.search(

R'gutgeheissen|gutzuheissen|admis|accolto|accolta'

, string)

if gutgeheissen != None:

string = 'Gutgeheissen'

else:

string = 'Abgewiesen'

#Categorising using Regular Expressions

17. Feb., Python Summit 2017, bsk

Here we are categorising the decisiojn. A simpe solution. WOrked with more complicared ones to consider typos of the jdges.

for judge in relevant_clean_judges:

judge = re.search(judge, doc)

if judge != None:

judge = judge.group()

short_judge_list.append(judge)

else:

continue

#Looking for the judges

17. Feb., Python Summit 2017, bsk

Here I a basically just interating through the files to find all of the judges. With the list of jugdes we scraped before, including the judges that are no longer in office.

#First results and visuals

17. Feb., Python Summit 2017, bsk

Using pandas visualising first result (for this I obviously had to harmonise the dates first)

#And after a bit of pandas wrangling, the softest judges...

17. Feb., Python Summit 2017, bsk

Using pandas visualising first result (for this I obviously had to harmonise the dates first)

#...and toughest ones

17. Feb., Python Summit 2017, bsk

Using pandas visualising first result (for this I obviously had to harmonise the dates first)

3%

This is the number of appeals I could not categorise automatically. So approximatly 300. (If I was a scientist I would do this by hand now, but I’m a lazy journalist…)

publishing

After talking to lawyers, experts and the court, we published our story “Das Parteibuch der Richter beeinflusst die Asylentscheide” on 10 Oct 2016. The whole research took us 3 weeks.

publishing

After talking to lawyers, experts and the court, we published our story “Das Parteibuch der Richter beeinflusst die Asylentscheide” on 10 Oct 2016. The whole research took us 3 weeks.

The court couldn’t except the results (at first)

A Completely Different Angle

Thanks!

Barnaby Skinner, Datajournaist @tagesanzeiger & @sonntagszeitung

@barjack, github.com/barjacks, www.barnabyskinner.com

Swiss Asylum Lottery, Python Summit - Google Slides