VoxCivica Challenge

Introduction

Hackathon results

General architecture

SEPE Crawler module

Justizianet crawler module

Web interface

Code and links

Authors

Introduction

The focus of this application is to:

  1. Download the spanish unemployment data and make it available in a computer-friendly format.
  2. Download the spanish eviction data and make it available in a computer-friendly format.
  3. Show this data graphically
  4. Correlate both datasets

The full research about what datasets are relevant for this task was done by VoxCivica and a set of individuals (Jorge Campanillas, Guzmán Garmendia, Iñigo Kortabitarte, David Maeztu, Alberto Ortiz de Zarate), as well as the full description of the challenge.

Hackathon results

General architecture

The general architecture of the application is described in the following figure. There are two crawler components developed in Python, which download and process data from SEPE (the public institution that manages unemployment in Spain) and Justizianet (the basque government justice department), and store the information in a MySQL database. A django application provides a RESTful API that enables other systems to download data from this database, offered in JSON. Finally, a web application consumes this data and provides graphs of the evolution of unemployment or its relation with eviction.

SEPE Crawler module

The SEPE crawler downloads all the information about unemployed citizens in Spain, since May 2005 to October 2012. All this information is available here as a set of Microsoft Excel files. Each file shows the information of each province of Spain, and also of each town in that province, classifying unemployed people by gender, age and industry, as shown in the following figure:

The crawler component downloads automatically all the files (4679 files; 358 MB) and processes all these documents using xlrd1, storing this information in the MySQL database. A snapshot of this database is available here.

Justizianet screen-scrapper module

The screen-scrapper module enables users to download information from Justizianet web page. As the information is only published on the web page in HTML format, the best solution for data extraction was to study the page structure and used queries and create the code to automate download. The figure shows an example of the web which provides the information and needs to

be parsed.

The script is implemented in Python using BeautifulSoup as the library for html scrapping. As we are only interested in those results related with evictions, we filter available records to obtain only those which contain “Ej.hipotecari.” in their “Procedimiento Judicial” field. Results are also filtered by searching for some key words to obtain only those results related with housing. The obtained data is stored in a MySQL database in order to allow further querying.

Web interface

The web interface has been developed in Django, providing a RESTful query API based on JSON, and some graphics generated with highcharts.js. Here are some screenshots of the application:

Application initial page

Unemployment evolution in Gipuzkoa (blue line is the capital city -Donosti / San Sebastián-; red line is Irun and green line is Errenteria)

Unemployment evolution in Baleares (seasonal jobs can easily be identified; each line is a different town)

Eviction - unemployment relation in Bizkaia

Code and links

Authors

The code was developed during the RHoK - Bilbao December 2012 by the following individuals, sorted by last name’s alphabetical name:

However, we’d like to acknowledge the original research and the idea itself, organized by VoxCivica and the following individuals, sorted by last name’s alphabetical name: