Open Data Ecosystems

raphael.volz@hs-pforzheim.de

Nov 30, 2017

About Me

  • Professor for Applied Computer Science
  • Faculty of Engineering
  • Research Foci:
    • Data Science
    • Cloud Computing

Talk Outline

  • Introduction
    • What is open data ?
    • What is a data ecosystem ?
  • Community-driven open data ecosystems
    • Wikidata
    • OpenStreetMap
  • Principles for successful Open Data Ecosystems

Introduction

Burger King Ad

What is open data?

Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.

Source: http://opendefinition.org/

What is a data ecosystem?

A community of interacting organizations and individuals that produce, use and reuse a set of data. The dataset is the keystone around which applications and services provide value and thereby become part of the data ecosystem.

Ecosystem members can have various roles. Common roles are contributor, supplier, aggregator, enabler, enricher, developer as well as the common user.

Source: Own definition

High-level view

Source: WHO eHealth
Source: WHO eHealth

Community-Driven Open Data Ecosystems

Wikidata

Wikipedia Infoboxes

Source: Karl Benz on Wikipedia
Source: Karl Benz on Wikipedia

Wikidata Purpose

  • Centralize the facts from Wikipedia info boxes
  • For reuse across 300 Wikipedia languages
  • e.g. 78 articles about Zika had different infoboxes
  • For querying and use by third party apps
  • Improve interwiki links

Wikidata properties

  • a knowledge graph based on items
  • free and open
  • collaborative
  • multilingual
  • manually curated ( unlike DBpedia )

Knowledge graph

People filmed with Jim Carry

Source: Wikidata Graph Builder
Source: Wikidata Graph Builder

Items have properties

Karl Benz

Source: Reasonator
Source: Reasonator

Wikidata data model

Source: Wikidata Data Model Primer
Source: Wikidata Data Model Primer

Wikidata size June November 2017

  • 26.3 38.6 Mio. items
  • 150 326 Mio. statements about items
  • 500 597 Mio. edits have been made since launch
  • Currently >17 18.6 Tsd. active users
  • Observe growth in detail statistics

Wikidata ecosystem

Wikidata Query Service

What are the 10 largest cities with a female mayor ?

See Result. Modify Query.

SELECT DISTINCT  ?cityLabel ?population ?mayorLabel 
WHERE 
{
    ?city wdt:P31/wdt:P279* wd:Q515 .  # find instances of subclasses of city
    ?city p:P6 ?statement .            # with a P6 (head of goverment) statement
    ?statement ps:P6 ?mayor .           # ... that has the value ?mayor
    ?mayor wdt:P21 wd:Q6581072 .       # ... where the ?mayor has P21 (sex or gender) female
    FILTER NOT EXISTS { ?statement pq:P582 ?x }  # ... but the statement has no P582 (end date) qualifier
     
    # Now select the population value of the ?city
    # (wdt: properties use only statements of "preferred" rank if any, usually meaning "current population")
    ?city wdt:P1082 ?population .
    # Optionally, find English labels for city and mayor:
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
    }
}
ORDER BY DESC(?population)
LIMIT 10

Wikidata Queries - DefaultViews

Timeline of space probe launches.

#defaultView:Timeline
SELECT ?item ?itemLabel ?launchdate (SAMPLE(?image) AS ?image)
WHERE
{
    ?item wdt:P31 wd:Q26529 .
    ?item wdt:P619 ?launchdate .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    OPTIONAL { ?item wdt:P18 ?image }
}
GROUP BY ?item ?itemLabel ?launchdate

See Result

Wikidata Queries - Maps

Map of Lighthouses in Norway

OpenStreetMap (OSM)

OSM Background

  • OpenStreetMap (OSM) was created in July 2004 by Steve Coast (a UCL student)
  • He did not understand why the Ordnance Survey created massive geographical datasets but did not freely distribute them to those who had paid to create them
  • GeoData only freely available in some countries, e.g. the US and the Netherlands

OSM Properties

  • Collaborative
    • maintained by individual contributors
    • Wikipedia principle, everyone can edit and contribute
  • Donated data sets imported in bulk (particularly Eastern Europe)
  • Automated robots cleaning data
  • Open Data, free to use (under OdBL license)
  • Not a map, but a database

OSM hypergraph

  • Nodes: basic geographic point.
    • Geographic point: latitude & longitude (WGS84)
    • Point Of Interest (POIs)
  • Ways: ordered interconnection of nodes
    • open ways = linear features (roads, railways…)
    • closed ways = areas
  • Relations: group of any primitive with associated roles
    • Relate nodes, ways and potentially other relations to each other,
    • thereby forming complex objects (multipolygons)
  • Nodes, ways, relations are versioned and user attributed

OSM Elements

Each OSM entity (node, way, relation) has:

  • a numeric identifier: OSM ID
  • a set of generic attributes present for every element
    • uid, user: user id and user name
    • timestamp: time of the last modification
    • visible: if false then the element should only be returned by history calls
    • version: edit version of the object (starts from 1)
    • changeset: the changeset (group of edits made within a certain time by one user) in which the object was created or updated
  • a set of tags (key-value pairs)

OSM Tags / Ontology

  • key-value pairs
  • e.g. highway=residential
  • use of tags and values is not restricted
  • defines the basic ontology of OSM
  • see taginfo

OSM statistics

Metric July 2016 June 2017 Nov 2017
Users 2,867,221 3,954,309 4,402,229
Nodes 3,463,959,970 3,926,828,147 4,197,365,421
Ways 360,469,340 416,654,804 454,113,805
Relations 4,387,699 5,043,226 5,390,806
GPS traces 5,280,183,660 5,715,425,150 5,953,688,363

Top user has contributed 326,511,847 (6%) GPS traces.

Source

OSM Open Data Ecosystem

  • Many applications using the data
  • Many services based on the data
  • Many (open source) tools for handling the data
  • Primary application areas
    • Map Rendering (One Dataset, several renderings)
    • Geo Search (POI, (Reverse) Name Resolution)
    • Routing
    • Geographic Database
    • Data Editors

Map Rendering

Map Compare

Source: F4 Map
Source: F4 Map

Routing

Data Service - Overpass

Example Query: Chinese Restaurants on the map

node
  [amenity=restaurant]
  [cuisine=chinese]
  ({{bbox}});
out;

Run Overpass Query.

Overpass Query: Streets on the map

way({{bbox}})
  [highway]
  [name];
out;

Run Overpass Query.

Case Study WheelMap.org

Source: http://wheelmap.org
Source: http://wheelmap.org
Source: http://wheelmap.org
Source: http://wheelmap.org
Source: OpenStreetMap
Source: OpenStreetMap

Conclusion

  • Community-driven data ecosystems thriving
  • High and increasing partipation
  • Contribution to data sets at the core of the ecosystem
  • Major issues:
    • Incompatibility of licenses (ODbl vs. CC)
    • No global identifiers (linking data sets is still hard), Wikidata providing a basis for bridging ids
    • New community-driven data ecosystems will probably be domain-focused (winner takes all)

Success Factors for Open Data Ecosystems

Success Factors

  • Adopt emerging best practises, see [1, 2, 3, 4, 5, 6, 7]
  • Define a priority domain
  • Create linkable data with global identifiers (URIs)
  • Allow extensions by third party contributions
  • Use feedback cycles including versioning
  • Track progress with statistics of agreed on metrics
  • Manage the community
  • Disseminate!

Data ecosystems are all about people

Source: https://en.wikipedia.org/wiki/Magnus_Manske
Source: https://en.wikipedia.org/wiki/Magnus_Manske

Thank you for your attention! Questions ?