Linked data and GI data workshop
University of Portsmouth 26th April 2011
The workshop, a precursor to this year’s GISRUK conference, focussed on the application of Linked Data to spatial data.
Richard Wallis of Talis provided an overview of Linked Data. The Research funding explorer uses funding data that has been transformed into Linked Data and mapped on Google maps. The data is stored in the Talis Platform which provides data hosting and an API for RDF. The Talis Connected Commons scheme provides free data hosting for those wishing to make their datasets freely available as Linked Data. Talis have also been involved in the data.gov.uk initiative.
DBpedia attempts to add structure to the data in Wikipedia, making it available as Linked Data and thus available for re-use in other applications. The data can be queried and linked to other data.
Linked data is graph based and self describing.
John Goodwin from the Ordnance survey focussed on Location and Linked Data. John has been involved in publishing OS datasets as Linked Data.
RDF is to the web of data as HTML is to the web of documents.
RDF is based on triples of a subject, predicate and object e.g. John is based near Southampton. Each element of the triple is identified by a URI. Triples can be stored in triplestores, databases designed to store triples. Just as SQL can be used to query relational databases, SPARQL can be used to query RDF data.
An OS Linked Data URI is of the form:
http://data.ordnancesurvey.co.uk/id/7000000000017711
If the URI is entered into a web browser then the server will convert the RDF into a human readable form.
Silver Oliver, an Information Architect at the BBC described some of the BBC’s work with Linked Data in a talk entitled “Enabling ontology driven information architecture with Linked Data”. His premise is that people are interested in things not documents. The BBC Wildlife Finder uses DBpedia to add background information to the BBC’s images and videos. The BBC also developed a sport ontology which was used in the development of their World Cup 2010 website. By structuring data using ontologies, relevant content can be added automatically to a page for a particular football match, say; Wayne Rooney the England player can be distinguished from Wayne Rooney the Manchester United player.
Mike Turnill described Oracle’s support for semantic technologies and Linked Data in particular. Oracle 11g has the capability of storing and querying billions of RDF triples. It can also generate derived data (e.g. if ‘a’ is part of ‘b’ and ‘b’ is part of ‘c’ then ‘a’ must be part of ‘c’). RDF specific queries can also be embedded in SQL queries.
Paul Watson from 1Spatial discussed spatial queries in OWL (the Web Ontology Language). The problem is that conventional geometric data is not graph oriented but it is possible to model spatial relations using graph notation. Using the ontology editor Protégé it is possible to answer questions such as which bus routes run along a particular road (or edge in graph terminology) or find all roads meeting at a junction (node in graph terminology). In conclusion it is possible to use OWL reasoners to perform spatial queries.
The post lunch session focussed on Academic Linked Data projects. Jo Walsh from Edinburgh described the Chalice project which aims to provide a gazetteer of English place names and provide them as Linked Data. The source data, index cards, is scanned and geo-parsed using the Edinburgh geoparser.
The sameAs service takes a URI or a string as input and returns any URIs that are co-referent. So, for example, supplying the OS URI from above will return URIs at openlylocal.com and statistics.data.gov.uk
Leif Esaksen from the University of Southampton presented on “Annotating the Ancient World”. He described the PELAGIOS project that aims to supply location data in the Ancient World as Linked Data. His contention was that place not co-ordinates is the lowest common denominator – most space is empty; the interesting stuff is what happens at places.
Humphrey Southall (University of Portsmouth) and Richard Light described a project for exposing the GB Historical GIS as Linked Data. Administrative units are important historically but they are more legal units than geographical features. An ontology for an administrative unit was required. The project aims to use the conversion of Isle of Wight administrative units into Linked Data as a use case. The source data is in a relational database.
The next session was a breakout session. I attended the Getting Started session. Silver Oliver led the group in a practical exercise of taking a list of British Prime Ministers from the Guardian Datastore. The dataset (held as a Google Doc spreadsheet) can be input into Google Refine, a desktop application that can be used to generate RDF data from supplied data. It does require the user to map the columns of data (name, year took over, party) to a description. Ontologies such as those defined by Friend of A Friend can be used to define standard objects such as someone’s name (i.e. the Prime Minister). For the political party DBpedia can be used. The main lesson from the breakout group was that some thought has to be given on the definition of your data. You need to be clear who is going to use your linked data and for what purpose.
Conclusions: someone made the comment that the entry level for supplying semantic data was much higher than for supplying unstructured old-style web pages. This is true and perhaps inevitable but tools will become available to make it easier. Also the concepts are more complicated – the idea of objects as URIs, ontologies etc. I think I now have an idea of how I could create linked data from a given dataset. But that’s only part of the process – how is linked data discovered and then used/consumed? For a given postcode, say, how does an application automatically link to data related to that postcode?
Note: If any of the talks have been misrepresented above, then the fault is mine, rather than the speaker's. Apparently those of us with a relational database background have trouble getting their heads round the concepts of RDF and linked data, so I've got an excuse!
Wednesday, 27 April 2011
Friday, 1 April 2011
Automated workflow generation
I've recently started a PhD in on-demand mapping at Manchester Metropolitan University. The project is part funded by the Ordnance Survey. The aim, briefly, is to develop a workflow system to generate maps according to user preferences. This will involve selecting and sequencing cartographic generalisation algorithms. There have been a number of projects on sequencing generalisation services but these have tended to be for pre-defined outputs.
Anyway,I came across a paper Domain Knowledge-based Automatic Generation (2002) that looks at how to generate a workflow given a set of services, a set of composition rules and a user goal and a set of user preferences. The services are modelled using an ontology. This is necessary since the relationships between services (has component, component of etc) are not described by the individual services. Each service consists of a set of attributes and a set of relationships.
The composition rules are also expressed using an ontology. In their case-study (starting a new business) the composition rules were derived from government regulations and consist of condition-action pairs. For example a selection rule may be (business type = limited company, register business name).
They define a Workflow Composition Function that given a set of services in the service ontology, a set of rules in the rule ontology, and a set of user preferences, generate a workflow. They also provide an agorithm for generating the workflow.
Is this method applicable to our on-demand mapping project? Are the government regulations analogous to cartographic rules? A useful next step would be to apply the method to our accident map case study and see what happens.
The website for the project is still available.
The composition rules are also expressed using an ontology. In their case-study (starting a new business) the composition rules were derived from government regulations and consist of condition-action pairs. For example a selection rule may be (business type = limited company, register business name).
They define a Workflow Composition Function that given a set of services in the service ontology, a set of rules in the rule ontology, and a set of user preferences, generate a workflow. They also provide an agorithm for generating the workflow.
Is this method applicable to our on-demand mapping project? Are the government regulations analogous to cartographic rules? A useful next step would be to apply the method to our accident map case study and see what happens.
The website for the project is still available.
Subscribe to:
Posts (Atom)