1 of 26

Swiss Asylum Lottery

Scraping the Federal Administrative Court's Database and Analysing the Verdicts

17. Feb., Python Summit 2017, bsk

2 min. My name, what I do, and where I’ve come from. Very grateful to be able to speak to you today. This time last year it would have been impossible, because I hadn’t written a line of code.

I have spent a lot of time writing about technology. But about 3 years ago I started moving towards more data driven reporting. And last year I went to New York to the Columbia School of Journalism, where I could attend the Computer Assisted Reporting course called Lede12 in New York. Tough, a lot of sleepless nights. Python, lbraries like pandas, even ski kit learn, where we played around witht he iris data, and the titanic data of course.

During this course I learnt how to pogramme. And basically a new world developed for me. And I realised how really powerful the combination of coding skills and story telling is. And how it can shape our world, and at least I hope, make it a better place.

2 of 26

How the story began

17. Feb., Python Summit 2017, bsk

3 of 26

Where’s the data?

17. Feb., Python Summit 2017, bsk

URL

4 of 26

Textfiles

17. Feb., Python Summit 2017, bsk

5 of 26

The Court

17. Feb., Python Summit 2017, bsk

6 of 26

Who are these judges?

17. Feb., Python Summit 2017, bsk

7 of 26

The Plan

17. Feb., Python Summit 2017, bsk

A Scrape judges

B Scrape appeal DB

C Analyse verdicts

8 of 26

A The judges

Scraping the BVGer site using BeaufitulSoup

17. Feb., Python Summit 2017, bsk

import requests

from bs4 import BeautifulSoup

import pandas as pd

Code on Github

9 of 26

B THE APPEALS

This is a little trickier, want to go a little more into depth here

17. Feb., Python Summit 2017, bsk

import requests

import selenium

from selenium import webdriver

import time

import glob

10 of 26

driver = webdriver.Firefox()

search_url = 'http://www.bvger.ch/publiws/?lang=de'

driver.get(search_url)

driver.find_element_by_id('form:tree:n-3:_id145').click()

driver.find_element_by_id('form:tree:n-4:_id145').click()

driver.find_element_by_id('form:_id189').click()

#Navigating to text files

17. Feb., Python Summit 2017, bsk

11 of 26

for file in range(0,last_element):

Text = driver.find_element_by_class_name('icePnlGrp')

counter = str(file)

file = open('txtfiles/' + counter + ".txt", "w")

file.write(Text.text)

file.close()

driver.find_element_by_id("_id8:_id25").click()

#Visiting and saving all the text files

17. Feb., Python Summit 2017, bsk

12 of 26

C Analyse verdicts

The is the trickiest part. I wont go through the whole code

import re

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import glob

import time

import dateutil.parser

from collections import Counter

%matplotlib inline

The entire code

13 of 26

whole_list_of_names = []

for name in glob.glob('txtfiles/*'):

name = name.split('/')[-1]

whole_list_of_names.append(name)

#Preparing list of file names

17. Feb., Python Summit 2017, bsk

14 of 26

def extract_aktennummer(doc):

nr = re.search(r'Abteilung [A-Z]+\n[A-Z]-[0-9]+/[0-9]+', doc)

return nr

def extracting_entscheid_italian(doc):

entscheid = re.findall(r'le Tribunal administratif fédéral

prononce\s*:1.([^.]*)', doc)

return entscheid[0][:150]

#Developing Regular Expressions

17. Feb., Python Summit 2017, bsk

15 of 26

def decision_harm_auto(string):

gutgeheissen = re.search(

R'gutgeheissen|gutzuheissen|admis|accolto|accolta'

, string)

if gutgeheissen != None:

string = 'Gutgeheissen'

else:

string = 'Abgewiesen'

#Categorising using Regular Expressions

17. Feb., Python Summit 2017, bsk

16 of 26

for judge in relevant_clean_judges:

judge = re.search(judge, doc)

if judge != None:

judge = judge.group()

short_judge_list.append(judge)

else:

continue

#Looking for the judges

17. Feb., Python Summit 2017, bsk

17 of 26

#First results and visuals

17. Feb., Python Summit 2017, bsk

18 of 26

#And after a bit of pandas wrangling, the softest judges...

17. Feb., Python Summit 2017, bsk

19 of 26

#...and toughest ones

17. Feb., Python Summit 2017, bsk

20 of 26

3%

This is the number of appeals I could not categorise automatically. So approximatly 300. (If I was a scientist I would do this by hand now, but I’m a lazy journalist…)

21 of 26

publishing

After talking to lawyers, experts and the court, we published our story “Das Parteibuch der Richter beeinflusst die Asylentscheide” on 10 Oct 2016. The whole research took us 3 weeks.

22 of 26

publishing

After talking to lawyers, experts and the court, we published our story “Das Parteibuch der Richter beeinflusst die Asylentscheide” on 10 Oct 2016. The whole research took us 3 weeks.

23 of 26

24 of 26

The court couldn’t except the results (at first)

25 of 26

A Completely Different Angle

URL

26 of 26

Thanks!

Barnaby Skinner, Datajournaist @tagesanzeiger & @sonntagszeitung

@barjack, github.com/barjacks, www.barnabyskinner.com