Wednesday 27 April 2011

Linked Data workshop

Linked data and GI data workshop
University of Portsmouth 26th April 2011

The workshop, a precursor to this year’s GISRUK conference, focussed on the application of Linked Data to spatial data.

Richard Wallis of Talis provided an overview of Linked Data. The Research funding explorer uses funding data that has been transformed into Linked Data and mapped on Google maps. The data is stored in the Talis Platform which provides data hosting and an API for RDF. The Talis Connected Commons scheme provides free data hosting for those wishing to make their datasets freely available as Linked Data. Talis have also been involved in the data.gov.uk initiative.

DBpedia attempts to add structure to the data in Wikipedia, making it available as Linked Data and thus available for re-use in other applications. The data can be queried and linked to other data.

Linked data is graph based and self describing.

John Goodwin from the Ordnance survey focussed on Location and Linked Data. John has been involved in publishing OS datasets as Linked Data.

RDF is to the web of data as HTML is to the web of documents.

RDF is based on triples of a subject, predicate and object e.g. John is based near Southampton. Each element of the triple is identified by a URI. Triples can be stored in triplestores, databases designed to store triples. Just as SQL can be used to query relational databases, SPARQL can be used to query RDF data.

An OS Linked Data URI is of the form:
http://data.ordnancesurvey.co.uk/id/7000000000017711

If the URI is entered into a web browser then the server will convert the RDF into a human readable form.

Silver Oliver, an Information Architect at the BBC described some of the BBC’s work with Linked Data in a talk entitled “Enabling ontology driven information architecture with Linked Data”. His premise is that people are interested in things not documents. The BBC Wildlife Finder uses DBpedia to add background information to the BBC’s images and videos. The BBC also developed a sport ontology which was used in the development of their World Cup 2010 website. By structuring data using ontologies, relevant content can be added automatically to a page for a particular football match, say; Wayne Rooney the England player can be distinguished from Wayne Rooney the Manchester United player.

Mike Turnill described Oracle’s support for semantic technologies and Linked Data in particular. Oracle 11g has the capability of storing and querying billions of RDF triples. It can also generate derived data (e.g. if ‘a’ is part of ‘b’ and ‘b’ is part of ‘c’ then ‘a’ must be part of ‘c’). RDF specific queries can also be embedded in SQL queries.

Paul Watson from 1Spatial discussed spatial queries in OWL (the Web Ontology Language). The problem is that conventional geometric data is not graph oriented but it is possible to model spatial relations using graph notation. Using the ontology editor Protégé it is possible to answer questions such as which bus routes run along a particular road (or edge in graph terminology) or find all roads meeting at a junction (node in graph terminology). In conclusion it is possible to use OWL reasoners to perform spatial queries.

The post lunch session focussed on Academic Linked Data projects. Jo Walsh from Edinburgh described the Chalice project which aims to provide a gazetteer of English place names and provide them as Linked Data. The source data, index cards, is scanned and geo-parsed using the Edinburgh geoparser.

The sameAs service takes a URI or a string as input and returns any URIs that are co-referent. So, for example, supplying the OS URI from above will return URIs at openlylocal.com and statistics.data.gov.uk

Leif Esaksen from the University of Southampton presented on “Annotating the Ancient World”. He described the PELAGIOS project that aims to supply location data in the Ancient World as Linked Data. His contention was that place not co-ordinates is the lowest common denominator – most space is empty; the interesting stuff is what happens at places.

Humphrey Southall (University of Portsmouth) and Richard Light described a project for exposing the GB Historical GIS as Linked Data. Administrative units are important historically but they are more legal units than geographical features. An ontology for an administrative unit was required. The project aims to use the conversion of Isle of Wight administrative units into Linked Data as a use case. The source data is in a relational database.

The next session was a breakout session. I attended the Getting Started session. Silver Oliver led the group in a practical exercise of taking a list of British Prime Ministers from the Guardian Datastore. The dataset (held as a Google Doc spreadsheet) can be input into Google Refine, a desktop application that can be used to generate RDF data from supplied data. It does require the user to map the columns of data (name, year took over, party) to a description. Ontologies such as those defined by Friend of A Friend can be used to define standard objects such as someone’s name (i.e. the Prime Minister). For the political party DBpedia can be used. The main lesson from the breakout group was that some thought has to be given on the definition of your data. You need to be clear who is going to use your linked data and for what purpose.

Conclusions: someone made the comment that the entry level for supplying semantic data was much higher than for supplying unstructured old-style web pages. This is true and perhaps inevitable but tools will become available to make it easier. Also the concepts are more complicated – the idea of objects as URIs, ontologies etc. I think I now have an idea of how I could create linked data from a given dataset. But that’s only part of the process – how is linked data discovered and then used/consumed? For a given postcode, say, how does an application automatically link to data related to that postcode?

Note: If any of the talks have been misrepresented above, then the fault is mine, rather than the speaker's. Apparently those of us with a relational database background have trouble getting their heads round the concepts of RDF and linked data, so I've got an excuse!

No comments:

Post a Comment