Open Data Ecosystems

Raphael Volz ([email protected])

June 12, 2017

About Me

  • Professor for Applied Computer Science
  • Lectures in Data Science, Cloud Computing
  • Founder and Chairman of Volz Innovation GmbH (Consulting Firm specialized on IT Strategy, Data Science and Cloud Computing)

Talk Outline

  • Introduction
    • What is open data ?
    • What is a data ecosystem ?
  • Government-driven open data ecosystems
    • data.gov (US)
    • European Data Portal
  • Community-driven open data ecosystems
    • Wikidata
    • OpenStreetMap
  • Principles for successful Open Data Ecosystems

Introduction

Burger King Ad

What is open data?

Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.

Source: http://opendefinition.org/

What is a data ecosystem?

A community of interacting organizations and individuals that produce, use and reuse a set of data. The dataset is the keystone around which applications and services provide value and thereby become part of the data ecosystem.

Ecosystem members can have various roles. Common roles are contributor, supplier, aggregator, enabler, enricher, developer as well as the common user.

Source: Own definition

High-level view

Source: WHO eHealth
Source: WHO eHealth

Technical sketch of a data ecosystem

Source: CC-BY-SA 2.0 Rufus Pollock, 2011
Source: CC-BY-SA 2.0 Rufus Pollock, 2011

Government-Driven Open Data Ecosystems

10 open data principles

  • In October 2007, 30 open government advocates met in Sebastopol, California to discuss how government could open up government data for public use
  • Federal and state governments had made some data available to the public, usually inconsistently and incompletely, more and better data was needed.
  • Conference led to 8 (later 10) principles for open data

Source: Sunlight Foundation

10 open data principles

  1. Completeness
  2. Primacy
  3. Timeliness
  4. Ease of Access
  5. Machine readability
  6. Non-discrimination
  7. Use of commonly owned standards
  8. Licensing
  9. Permanence
  10. Usage Costs

Source: Sunlight Foundation

data.gov

Source: https://www.data.gov
Source: https://www.data.gov

Case Study SpotCrime

Source:SpotCrime.com
Source:SpotCrime.com

Open Data Charter

5 Principles agreed to at 2013 G8 Summit

  1. Open Data by default
  2. Quality and Quantity
  3. Usable by All
  4. Releasing data for improved governance
  5. Releasing data for innovation

Source: Open Data Charter

European Public Sector Information (PSI) Directive

  • 2003 adopted legislation to foster use of public data in all member states 2003/98/EC
  • 2013 revision 2013/37/EU mainly amended
    • open by default principle
    • break-away from cost-based charging toward marginal cost-oriented fee with transparent calculation
    • inclusion of cultural institutions as public bodies

5 PSI priority domains

  • geospatial and earth observation data
  • environmental data
  • transport data
  • statistical data
  • company data

European Data Portal

Source:European Data Portal
Source:European Data Portal

European Open Data Indicators

Source:European Data Portal
Source:European Data Portal

European Open Data Indicators

Source:European Data Portal
Source:European Data Portal

European Open Data Dissemination

Source:European Data Portal
Source:European Data Portal

Case Study: Fuel Prices in Germany

  • Since Sep 2013 companies operating a public fuel station must report prices to the German anti trust agency in real-time
  • Increase price transparency
  • “Improve the Bundeskartellamts’ possibilities to intervene in the case of illegal predatory strategies and other forms of market power abuse”
  • Open Data published on a portal run by Ministry of Transport (MDM Portal)
  • Data basis for (many) fuel-price apps

Case Study: Fuel Prices Data Set

  • 3 fuel types (E5,E10,Diesel)
  • 14.957 fuel stations
  • 30.231.752 price changes in one year( Jul 14- Jun 15)
  • 82.827 price changes per day
  • 5,6 price changes per station / day
  • Amount of price changes increasing in 15/16 and 16/17

Case Study: Fuel Prices Analysis

Conclusion

  • Open Data a policy in all G8 countries and EU member states
  • Initiatives under way and make measurable progress
  • Great economic and societal value can be shown already with many data sets
  • Major issues:
    • Hard to link data (no global record identifiers)
    • Community-building and dissemination
    • Contribution to data mostly impossible (data published and used “one way street”)

Community-Driven Open Data Ecosystems

Wikidata

Wikipedia Infoboxes

Source: Karl Benz on Wikipedia
Source: Karl Benz on Wikipedia

Wikidata Purpose

  • Centralize the facts from Wikipedia info boxes
  • For reuse across 300 Wikipedia languages
  • e.g. 78 articles about Zika had different infoboxes
  • For querying and use by third party apps
  • Improve interwiki links

Wikidata properties

  • a knowledge graph based on items
  • free and open
  • collaborative
  • multilingual
  • manually curated ( unlike DBpedia )

Knowledge graph

People filmed with Jim Carry

Source: Wikidata Graph Builder
Source: Wikidata Graph Builder

Items have properties

Karl Benz

Source: Reasonator
Source: Reasonator

Wikidata data model

Source: Wikidata Data Model Primer
Source: Wikidata Data Model Primer

Wikidata bulk downloads

Wikidata offers copies of its database

Can load those dumps with Wikidata Toolkit

Wikidata size

  • 26.3 Mio. items
  • 150 Mio. statements about items
  • 500 Mio. edits have been made since launch
  • Currently >17 Tsd. active users
  • Detail statistics are available

Wikidata ecosystem

Wikidata Query service

What are the 10 largest cities with a female mayor ?

See Result. Modify Query.

SELECT DISTINCT  ?cityLabel ?population ?mayorLabel 
WHERE 
{
    ?city wdt:P31/wdt:P279* wd:Q515 .  # find instances of subclasses of city
    ?city p:P6 ?statement .            # with a P6 (head of goverment) statement
    ?statement ps:P6 ?mayor .           # ... that has the value ?mayor
    ?mayor wdt:P21 wd:Q6581072 .       # ... where the ?mayor has P21 (sex or gender) female
    FILTER NOT EXISTS { ?statement pq:P582 ?x }  # ... but the statement has no P582 (end date) qualifier
     
    # Now select the population value of the ?city
    # (wdt: properties use only statements of "preferred" rank if any, usually meaning "current population")
    ?city wdt:P1082 ?population .
    # Optionally, find English labels for city and mayor:
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
    }
}
ORDER BY DESC(?population)
LIMIT 10

Wikidata Query Service

List of space probes with pictures.

SELECT ?link ?date ?picture 
 WHERE {
   ?link wdt:P31 wd:Q26529 ;
                      wdt:P18 ?picture ;
                      wdt:P619 ?date .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "fr,en" .
   }
 }
ORDER BY ?date
LIMIT 100

See Result

Wikidata Query Service

Map of Lighthouses in Norway

#defaultView:Map
SELECT DISTINCT ?item ?itemLabel ?coor ?image
{
  ?item wdt:P31 wd:Q39715 .
  ?item wdt:P17 wd:Q20 .
  OPTIONAL { ?item wdt:P625 ?coor }
  OPTIONAL { ?item wdt:P18 ?image }
  SERVICE wikibase:label
          {
    bd:serviceParam wikibase:language "nb,nn,en,fi" .}
}
ORDER BY ?itemLabel

See Result

OpenStreetMap (OSM)

OSM Background

  • OpenStreetMap (OSM) was created in July 2004 by Steve Coast, at the time studying at the University College of London.
  • He did not understand why the Ordnance Survey created massive geographical datasets but did not freely distribute them to those who had paid to create them
  • GeoData is only freely available from national authorities in the US and the Netherlands

OSM Properties

  • Collaborative
    • maintained by individual contributors
    • Wikipedia principle, everyone can edit and contribute
  • plus bulk data imports in the past (particularly Eastern Europe)
  • plus robots cleaning data
  • Open Data, free to use (under OdBL license)
  • Not a map, but a database

OSM Data structures

  • Elements (data primitives): basic components in OSM from which nodes, ways, relations inherit
  • Nodes: basic geographic point.
    • Geographic point: latitude & longitude (WGS84)
    • Point Of Interest (POIs)
  • Ways: ordered interconnection of nodes
    • open ways = linear features (roads, railways…)
    • closed ways = areas
  • Relations: group of any primitive with associated roles
    • Relate nodes, ways and potentially other relations to each other,
    • thereby forming complex objects (multipolygons).

OSM Elements

Each OSM entity (node, way, relation) has:

  • a numeric identifier: OSM ID
  • a set of generic attributes present for every element
    • uid, user: user id and user name
    • timestamp: time of the last modification
    • visible: if false then the element should only be returned by history calls
    • version: edit version of the object (starts from 1)
    • changeset: the changeset (group of edits made within a certain time by one user) in which the object was created or updated
  • a set of tags (key-value pairs)

OSM Tags / Ontology

  • key-value pairs
  • e.g. highway=residential
  • use of tags and values is not restricted
  • defines the basic ontology of OSM
  • see taginfo

Accessing OSM data

  • Convenient browsing through the web site based on a map view
  • Download the dataset
    • Warning: Heavy lifting and time consuming
    • Import Planet (gzipped 58 GB into Postgres + PostGIS)
    • After bulk import, process minutely deltas published for the data
    • Use osmosis tool and cron job to automate this
  • Use data services available in the data ecosystem

Querying OSM with Postgres

Get location of Colombo.

SELECT name, place, ST_XMin(way), ST_YMin(way)
FROM planet_osm_point WHERE name='Colombo' AND place='city';

Get the road network.

SELECT ST_SimplifyPreserveTopology (way ,5000) , highway
FROM planet_osm_line
WHERE highway IN ('motorway ', 'trunk ', 'primary ', 'secondary ');

OSM statistics

Metric July 2016 June 2017
Users 2,867,221 3,954,309
Nodes 3,463,959,970 3,926,828,147
Ways 360,469,340 416,654,804
Relations 4,387,699 5,043,226
GPS traces 5,280,183,660 5,715,425,150

Top user has contributed 326,511,847 (6%) GPS traces.

Source

OSM Open Data Ecosystem

  • Many applications using the data
  • Many services based on the data
  • Many (open source) tools for handling the data
  • Primary application areas
    • Map Rendering (One Dataset, several renderings)
    • Geo Search (POI, (Reverse) Name Resolution)
    • Routing
    • Geographic Database
    • Data Editors

Map Rendering

Map Compare

Source: F4 Map
Source: F4 Map

Routing

Data Service - Overpass

Example Query: Chinese Restaurants on the map

node
  [amenity=restaurant]
  [cuisine=chinese]
  ({{bbox}});
out;

Run Overpass Query.

Overpass Query: Streets on the map

way({{bbox}})
  [highway]
  [name];
out;

Run Overpass Query.

Case Study WheelMap.org

Source: http://wheelmap.org
Source: http://wheelmap.org

Case Study WheelMap.org

Source: http://wheelmap.org
Source: http://wheelmap.org

Case Study WheelMap.org

Source: OpenStreetMap
Source: OpenStreetMap

Conclusion

  • Community-driven data ecosystems thriving
  • High and increasing partipation
  • Contribution to data sets at the core of the ecosystem
  • Major issues:
    • Incompatibility of licenses (ODbl vs. CC)
    • No global identifiers (linking data sets is still hard), Wikidata providing a basis for bridging ids
    • New community-driven data ecosystems will probably be domain-focused (winner takes all)

Success Factors for Open Data Ecosystems

Success Factors

  • Adoption of emerging best practises 1 2 3 4 5 6 7
  • Setting of a priority domain
  • Linkable Data using global identifiers
  • Extensible Data allowing for third party contributions
  • Feedback Cycles including versioning
  • Progress tracking with statistics of agreed on metrics
  • Community Management and Dissemination

Data ecosystems are all about people

Source: https://en.wikipedia.org/wiki/Magnus_Manske
Source: https://en.wikipedia.org/wiki/Magnus_Manske

Thank you for your attention! Questions ?