1 of 38

Hacking Challenges

2 of 38

Challenge 1:

«Optimizing reporting-app»

by Tobias Brunner, GIS-Zentrum Stadt Zürich

3 of 38

Züri wie neu

  • Platform to report defects on the infrastructure
  • > 16’000 reports since 2013
  • > 5200 users
  • ~40% of reports are submitted by new users
  • ~11 min to administer report
  • Low threshold for user

4 of 38

What we need

  1. Lower threshold even more: Reduce steps for the user to report a defect
  2. Lower administrative costs: Reduce need for reassignment of reports to other (correct) categories

  • Automatic assignment of reports to categories
  • … your idea?

5 of 38

Resources

Report-data: https://data.stadt-zuerich.ch/dataset/zueriwieneu-meldungen

Description, position (e, n), time (requested datetime), photo (media_url), interface_used

Various geodataset: https://data.stadt-zuerich.ch/, https://opendata.swiss/de/organization/geoinformation-kanton-zuerich

Service_code : ground truth (verified category)

6 of 38

Challenge 2:

«Extracting individual trees from LIDAR»

by Katharina Kälin, Statistisches Amt Kanton Zürich

7 of 38

Problem: How “green” is Zürich?

tree planted

by the municipal

gardeners

tree planted

by private people

Available data

Missing data

8 of 38

What we need: LIDAR Data

Extract individual trees from LIDAR and assign attributes to them:

  • tree position
  • tree height
  • tree diameter
  • tree species or tree classification

(e.g. coniferous vs deciduous)

  • district name
  • location (garden, balcony, park etc.)
  • ...

9 of 38

Resources

10 of 38

Challenge 3:

«OpenStreetMap POI Completeness»

by HSR Geometa Lab�Raphael Das Gupta

11 of 38

Problem

Completeness of OSM mostly unknown, but useful for:

  • deciding where/what to map
  • determining whether OSM is “good enough” for your specific purpose

This challenge: Estimate completeness of POI (Points of Interest, i.e. shops, bars, restaurants, …)

12 of 38

What we need

Develop intrinsic (within OSM data) approach(es) for estimating OSM POI completeness.�Verify/tune extrinsically (by comparing to non-OSM data)

  • step 1 collect reference data (e.g. AOI)
  • step 2 learn intrinsic approaches
  • step 3 discuss approaches
  • step 4 decide on one approach
  • step 5 code and/or visualize

13 of 38

Resources

OpenStreetMap data (current & history)

Earth (big!): planet.osm.org

Switzerland: planet.osm.ch

Other countries and regions: osm-internal.download.geofabrik.de

Login may be required to access OSM history.�Please respect the privacy of mappers!

Official & 3rd-Party comparison data: md.coredump.ch/SDD18ZHHack-OSM-POI#data-sources

OGD (City of Zurich)

Proprietary data (Cities of Zurich and Geneva)

Tools, APIs, documentation, literature ...

14 of 38

Challenge 4:

«Online Search Behavior and Government Information»

by Andrea Schnell, Statistisches Amt Kanton Zürich

15 of 38

zh.ch

The website of the Canton of Zurich (zh.ch) is the digital interface for netizens and the cantonal public administration.

Hierarchical organizational structures & large quantities of content → not always easy to find the most relevant information!

Do content and structure mirror the needs of our users?

ADD A PICTURE

16 of 38

What we need

Analyze website traffic / web search data to find out:

  • What do people, looking for government information, search for?
  • What patterns can be found in the aggregated web traffic and search engine data?
  • Are there clusters of search terms that are equally related to pages of different administration units?
  • Do search terms, people use before navigating to zh.ch, match official language?

→ Help us to make zh.ch better. Your opinion and the insights, you can provide, count!

17 of 38

Resources

Google Search Terms related to zh.ch

data (caveat: varying timespan for different zh.ch domains!)

Web Analytics & Google Search Data for kapo.zh.ch / statistik.zh.ch

data (.zip)

List of Topics (A-Z) → “official language”

data

Github https://github.com/statistikZH/SDD2018ZHHACK

18 of 38

Challenge 5:

«Automatic detection of color for strip tests for water quality»

by Lukas Müller, Barbara Strobl and Simon Etter�University of Zurich

19 of 38

Problem

  • Eliminate subjectivity of color comparison for strip tests for water quality & make it more fun for citizen scientists
  • App code should automatically detect color and corresponding water quality value

20 of 38

What we need

You have 25 sample images available. The code should…

  • …recognise the four corners of the reference color palette.
  • …find the color on the palette that most closely matches the color on the stick.
  • …find the value for the matching color based on the location on the palette.
  • …print the value for the water quality parameter.

21 of 38

Resources

  • Photographs of simulated water quality measurements by citizen scientists.
    • https://tinyurl.com/yd323tre

22 of 38

Challenge 6:

«Adding and Correcting Entities in executive minutes»

by Tobias Hodel, Staatsarchiv Kanton Zürich

23 of 38

Problem

150’000 pages of handwritten executive minutes (Regierungsratsprotokolle, 1803-1883) have been transcribed by students.

The documents inform us about high politics and daily life in Zürich of the 19th century.

To enhance the usability, entities (person, places, organizations etc.) need to be identified (and thus searchable).

24 of 38

What we need

For starters, entities have been identified automatically (using a fixed list of places and persons), now this entities need to be checked and missed entities added.

  • step1 let’s create an interface to access and alter the documents
  • step2 add some text
  • step3 make it run on mobiles

  • step4 add some gamification (swiping)
  • step 5 add linking to norm data (wikidata/gnd/…)

25 of 38

Resources

Document Dump

ZENODO: https://doi.org/10.5281/zenodo.803239

About the project

http://www.staatsarchiv.zh.ch/internet/justiz_inneres/sta/de/ueber_uns/organisation/editionsprojekte/tkr.html

Sample Document

http://www.archives-quickaccess.ch/stazh/rrb/ref/MM+1.101+RRB+1827/0874

26 of 38

Challenge 7:

«The RefBank Challenge:�How to clean and de-duplicate one million bibliographic references?»

by Guido Sautter�Lead Software Developer at Plazi

gsautter@gmail.com @gsautter

27 of 38

Problem

The RefBank Corpus

  • Started in 2011 to collect bibliographic references
(plain strings, optionally with parsed versions attached)
  • Standalone Java web application (servlet-based) with relational DB
  • Open network of (pull-) replicating nodes
  • Web UI plus XML based REST API
  • GPLv3 (https://github.com/VBRANT/refbank)�

1,146,552 distinct reference strings (character by character)�1,026,753 distinct reference string clusters
(abstracting case, accents, spaces, and punctuation marks)�(as of Oct 19th, 2018)

28 of 38

What we need

  • Cluster up non-trivial duplicates (different representations of same reference):
    • Basis for building citation networks
  • Extract individual author names (as strings):
    • Basis for identifying authors as persons (assign ORCHID, etc.)
    • Basis for building co-author networks (also to help with above)
  • Extract journal names:
    • Create thesaurus
    • Help with detail parsing�

These three reference the same work … recognize them as a cluster:

Baroni Urbani, C. (1980) The first fossil species of the Australian ant genus Leptomyrmex in amber from the Dominican Republic. Stuttgarter Beiträge zur Naturkunde, Serie B, 62, 1-10.

Baroni Urbani, C. (1980): The first fossil species of the Australian ant genus Leptomyrmex in amber from the Dominican Republic. (Amber collection Stuttgart: Hymenoptera, Formicidae. III: Leptomyrmicini.).: 1-10

Baroni Urbani, C. (1980): The first fossil species of the Australian ant genus Leptomyrmex in amber from the Dominican Republic. (Amber Collection Stuttgart: Hymenoptera, Formicidae. III: Leptomyrmicini). Stuttgarter Beiträge zur Naturkunde. Serie B (Geologie und Paläontologie) 62: 1-10

29 of 38

Resources

Data

SQL Dump: http://plazi.cs.umb.edu/RefBank/RefBankDB.sql.gz

WebApp

Dump&Run Pack: http://plazi.cs.umb.edu/RefBank/RefBank.zip

Install Guide: http://plazi.cs.umb.edu/RefBank/static/downloadRefBank.html

30 of 38

Challenge 8:

«Looking for the WOW Wikidata query»

(by Cristina Sarasua, Universität Zürich)

31 of 38

Problem

Wikidata, the free knowledge base that anyone can use and edit, is growing fast and it is highly used.

How to showcase what one can do with Wikidata, and teach others how to use it?

Let’s see what the crowd asks for and spot WOW! Queries.

*we help Wikidata Facts and Query Examples!

51+ Million Data Items

749+ Million Edits

SPARQL Query Service: https://query.wikidata.org/

32 of 38

What we need: Mining SPARQL queries by the crowd

  • What kind of queries did people run? Create summary statistics based on e.g. the items and properties used, length, complexity, type of visualization, the number of results they provide ...
  • What are characteristics of WOW queries? Is novelty the key feature, is it interestingness, unexpectedness?
  • Manually pick and describe concrete WOW queries from the data set
  • Code a tool that (1) smartly picks a query and (2) asks several people to rate it ?

Tip: Get a subset of organic queries

33 of 38

Resources

Learning to Query Wikidata with SPARQL

Notebook showcasing some SPARQL & Wikidata features: https://tinyurl.com/y9hrpmad

Millions of SPARQL Queries Executed

Anonymised data set released by WMDE & University of Dresden: https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en

Related Publication: by Malyshev et al. at ISWC 2018 https://tinyurl.com/yc2u6a9m

General Wikidata Access: https://www.wikidata.org/wiki/Wikidata:Data_access

PAWS (your Jupyter instance by Wikimedia): https://paws.wmflabs.org/ Gastrodon Library, SPARQL Jupyter Kernel

Happy to give a SPARQL 101 introduction!

34 of 38

Challenge 9:

«OpenStreetMap Location Classification»

(by Sustainable FinTech & Carbon Delta)

35 of 38

Problem

OpenStreetMap location data has been instrumental in crafting a database of a company’s physical assets, which we use to quantify the company’s exposure to physical risks due to changing weather patterns.

Currently, our database lacks any kind of classification data and we must treat all locations as equals, despite knowing that sometimes a location represents a tiny retail outlet while at other times it is a high-throughput factory.

ADD A PICTURE

36 of 38

What we need

We need a program that, given an input of high-specificity latitude and longitude, can find the corresponding installation on OpenStreetMaps and classify it as a factory, farm, logistics, retail, office, etc., and ideally give further information as to the size (in m^2). Assume mapping to company has already been done.

  • Nespresso has a factory in Romont and many retail locations throughout CH
  • ABB has factories and office buildings throughout Europe
  • Countless other examples, all locations can be found on OpenStreetMap

ADD A PICTURE of what you envision / reference examples or whatever that helps participants to

37 of 38

Resources

Sites to Crawl

OpenStreetMap: https://www.openstreetmap.org

OSM API: https://wiki.openstreetmap.org/wiki/API

OpenStreetMap tools

OSM Wiki: https://wiki.openstreetmap.org/wiki/Map_Features#Shop

OSM Wiki: https://wiki.openstreetmap.org/wiki/Key:designation

38 of 38

Thanks!