What is open data?
Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.
Source: http://opendefinition.org/
What is a data ecosystem?
A community of interacting organizations and individuals that produce, use and reuse a set of data. The dataset is the keystone around which applications and services provide value and thereby become part of the data ecosystem.
Ecosystem members can have various roles. Common roles are contributor, supplier, aggregator, enabler, enricher, developer as well as the common user.
Source: Own definition
Wikidata Purpose
- Centralize the facts from Wikipedia info boxes
- For reuse across 300 Wikipedia languages
- e.g. 78 articles about Zika had different infoboxes
- For querying and use by third party apps
- Improve interwiki links
Wikidata properties
- a knowledge graph based on items
- free and open
- collaborative
- multilingual
- manually curated ( unlike DBpedia )
Wikidata size June November 2017
26.3 38.6 Mio. items
150 326 Mio. statements about items
500 597 Mio. edits have been made since launch
- Currently >
17 18.6 Tsd. active users
- Observe growth in detail statistics
Wikidata Query Service
What are the 10 largest cities with a female mayor ?
See Result. Modify Query.
SELECT DISTINCT ?cityLabel ?population ?mayorLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515 . # find instances of subclasses of city
?city p:P6 ?statement . # with a P6 (head of goverment) statement
?statement ps:P6 ?mayor . # ... that has the value ?mayor
?mayor wdt:P21 wd:Q6581072 . # ... where the ?mayor has P21 (sex or gender) female
FILTER NOT EXISTS { ?statement pq:P582 ?x } # ... but the statement has no P582 (end date) qualifier
# Now select the population value of the ?city
# (wdt: properties use only statements of "preferred" rank if any, usually meaning "current population")
?city wdt:P1082 ?population .
# Optionally, find English labels for city and mayor:
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
ORDER BY DESC(?population)
LIMIT 10
Wikidata Queries - DefaultViews
Timeline of space probe launches.
#defaultView:Timeline
SELECT ?item ?itemLabel ?launchdate (SAMPLE(?image) AS ?image)
WHERE
{
?item wdt:P31 wd:Q26529 .
?item wdt:P619 ?launchdate .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
OPTIONAL { ?item wdt:P18 ?image }
}
GROUP BY ?item ?itemLabel ?launchdate
See Result
Wikidata Queries - Maps
Map of Lighthouses in Norway
OSM Background
- OpenStreetMap (OSM) was created in July 2004 by Steve Coast (a UCL student)
- He did not understand why the Ordnance Survey created massive geographical datasets but did not freely distribute them to those who had paid to create them
- GeoData only freely available in some countries, e.g. the US and the Netherlands
OSM Properties
- Collaborative
- maintained by individual contributors
- Wikipedia principle, everyone can edit and contribute
- Donated data sets imported in bulk (particularly Eastern Europe)
- Automated robots cleaning data
- Open Data, free to use (under OdBL license)
- Not a map, but a database
OSM hypergraph
- Nodes: basic geographic point.
- Geographic point: latitude & longitude (WGS84)
- Point Of Interest (POIs)
- Ways: ordered interconnection of nodes
- open ways = linear features (roads, railways…)
- closed ways = areas
- Relations: group of any primitive with associated roles
- Relate nodes, ways and potentially other relations to each other,
- thereby forming complex objects (multipolygons)
- Nodes, ways, relations are versioned and user attributed
OSM Elements
Each OSM entity (node, way, relation) has:
- a numeric identifier: OSM ID
- a set of generic attributes present for every element
- uid, user: user id and user name
- timestamp: time of the last modification
- visible: if false then the element should only be returned by history calls
- version: edit version of the object (starts from 1)
- changeset: the changeset (group of edits made within a certain time by one user) in which the object was created or updated
- a set of tags (key-value pairs)
OSM statistics
Users |
2,867,221 |
3,954,309 |
4,402,229 |
Nodes |
3,463,959,970 |
3,926,828,147 |
4,197,365,421 |
Ways |
360,469,340 |
416,654,804 |
454,113,805 |
Relations |
4,387,699 |
5,043,226 |
5,390,806 |
GPS traces |
5,280,183,660 |
5,715,425,150 |
5,953,688,363 |
Top user has contributed 326,511,847 (6%) GPS traces.
Source
OSM Open Data Ecosystem
- Many applications using the data
- Many services based on the data
- Many (open source) tools for handling the data
- Primary application areas
- Map Rendering (One Dataset, several renderings)
- Geo Search (POI, (Reverse) Name Resolution)
- Routing
- Geographic Database
- Data Editors
Geo Search
Nominatim
- search for a name or address (forward search)
- look up data by its geographic coordinate (reverse search)
- Each result comes with a link to a details page where you can inspect the data
- Debug info to investigate how the address of the object has been computed.
- Available Open Source
- In Production on main site and available via APIs
Data Service - Overpass
Example Query: Chinese Restaurants on the map
node
[amenity=restaurant]
[cuisine=chinese]
({{bbox}});
out;
Run Overpass Query.
Conclusion
- Community-driven data ecosystems thriving
- High and increasing partipation
- Contribution to data sets at the core of the ecosystem
- Major issues:
- Incompatibility of licenses (ODbl vs. CC)
- No global identifiers (linking data sets is still hard), Wikidata providing a basis for bridging ids
- New community-driven data ecosystems will probably be domain-focused (winner takes all)