API Help
There are a number of ways to access the data in PublishMyData-powered sites. This page describes Version 1.0 of our API. We expect to add additional features in future, but we'll try to make any changes backward compatible. That might not always be possible, but if we stop supporting any of the features listed here, we'll let you know.
A note about domain names:
The examples in this document use publishmydata.com as the domain for the requests, but the concepts apply to all PublishMyData-powered sites. If you're accessing data from another site, simply replace the domain name with that of the site you wish to query (e.g. http://opendatacommunities.org/sparql.json?query=... )
Contents:
Linked Data Browsing
URI Dereferencing
Following the standard practices for Linked Data, we distinguish between a resource and documents about a resource. Identifiers for the resource follow the pattern:
http://{domain}/id/{...}
When you look them up you get redirected to the corresponding document about that thing. The document URL follows the pattern:
http://{domain}/doc/{...}
For example, in our Transport for London dataset, the identifier for Euston Underground Station is
http://publishmydata.com/id/transport/uk/transport-for-london/station/euston
If you put this into your browser you get redirected to an HTML page about Euston
http://publishmydata.com/doc/transport/uk/transport-for-london/station/euston
Resource Document Formats
You can specify what format you want that document to be in. By default you get HTML in a human-readable form, but you can also ask for the document in one of several RDF formats: RDF/XML, N-triples, Turtle or RDF/JSON.
There are two ways to specify which format you want: you can append a format extension to the URI or you can use the HTTP 'Accept' header. For both of these approaches, you can apply it either to the resource identifier, the .../id/... URI, or the document address, the .../doc/... URI.
| Format | Extension | Accept Header |
|---|---|---|
| RDF/XML | .rdf | application/rdf+xml |
| n-triples | .nt |
text/n3
text/plain |
| turtle | .ttl | text/turtle |
| JSON | .json | application/json |
Ruby Example
Here's an example of dereferencing a URI using the Ruby 'RestClient' library. Similar approaches can be taken in other languages.
This assumes you already have Ruby set up on your system. Also, if you don't already have it, you'll need to install the rest-client gem:
gem install rest-client
Rest Client documentation is here.
require 'rest-client' # specify the format as an extension - in this case JSON # this involves two requests, because doing an HTTP GET on the resource identifier gives you a # 303 redirect to the appropriate document page. # RestClient looks after that for you. puts RestClient.get 'http://publishmydata.com/id/transport/uk/transport-for-london/station/euston.json' # You get the same result if you ask for the document page directly puts RestClient.get 'http://publishmydata.com/doc/transport/uk/transport-for-london/station/euston.json' # You can also ask for the appropriate document page directly # specify the format with the Accept header - in this case asking for RDF/XML puts RestClient.get 'http://publishmydata.com/id/transport/uk/transport-for-london/station/euston', :accept=>'application/rdf+xml'
Alternative URLs for convenient browsing
Datasets
Alongside the definitive URI for a resource, we offer alternative additional URLs for the information about resources that reflects the way we organise the data into datasets and by type. These offer some convenient ways to navigate and access the data.
Our 'Transport for London' dataset of Tube stations and Tube lines has an identifier of:
http://publishmydata.com/id/dataset/transport/uk/transport-for-london
but it can also be accessed at:
http://publishmydata.com/datasets/transport-for-london
List resources of a type
The following url:
http://publishmydata.com/datasets/transport-for-london/tube-stations
provides a list of all resources in the Transport for London dataset which have a type of:
http://publishmydata.com/def/transport/tube#TubeStation
Individual resources of a type
An individual station can be found at (for example)
http://publishmydata.com/datasets/transport-for-london/tube-stations/euston
This view is the same as you get via
http://publishmydata.com/doc/transport/uk/transport-for-london/station/euston.
These URLs follow the pattern:
http://{domain}/datasets/{dataset short name}/{type short name}/{resource short name}[.{format}]
Formats
As with the basic Linked Data Browsing, the information about a resource can be retrieved in multiple formats. Add the format extension to the URL or use the HTTP Accept header as explained above.
The list of all resources of a given type in a given dataset, together with the available triples about those resources, can also be retrieved in multiple formats, using the approaches described above. For example
http://publishmydata.com/datasets/transport-for-london/tube-stations.nt
These results are paged. Use the parameters _page and _per_page to control the paging process. See the SPARQL 'Paging' section below for more details.
To get dataset metadata in machine readable formats, you can use the pattern
http://{domain}/datasets/{dataset short name}.[{format}]
Resources in external domains
When minting URIs to identify resources we want to talk about, the usual Linked Data practice is to create those URIs in a domain you control, so that it is possible to respond to them in the ways described above.
However, there are times when it is useful to hold information about external URIs in a triple store - that is URIs in a domain that we don't control. Information about those URIs can be retrieved using SPARQL, but it's also useful to have a standard URL pattern to access them.
This is possible using the pattern:
http://{domain}/resources/{external identifier, with 'http://' removed}
For example, we have a copy of the Ordnance Survey postcode data in PublishMyData, as it is very useful for many geographical queries. The Ordnance Survey identifier for SW1A 1AA is
http://data.ordnancesurvey.co.uk/id/postcodeunit/SW1A1AA.
Our copy of the information can be accessed at
http://publishmydata.com/resources/data.ordnancesurvey.co.uk/id/postcodeunit/SW1A1AA
Often these external resources will be organised into a dataset, and so accessible via the convenience URLs described above, e.g.
http://publishmydata.com/datasets/postcodes/postcode-units/SW1A1AA
but this additional /resources pattern allows arbitrary external resources to be addressed, whether or not they are in one of our datasets.
SPARQL
Introduction to SPARQL
The most flexible way to access the data is by using SPARQL. To submit a SPARQL query from your code, issue an HTTP GET to
http://{domain}/sparql.{format}?query={URL-encoded query}
For example, to run this simple query
SELECT * WHERE {<http://publishmyzata.com/id/transport/uk/transport-for-london/station/euston> ?p ?o}
and get the results as JSON, you need to GET the following URL (note the .json extension)
http://publishmydata.com/sparql.json?query=SELECT+%2A+WHERE+%7B%3Chttp%3A%2F%2Fpublishmydata.com%2Fid%2Ftransport%2Fuk%2Ftransport-for-london%2Fstation%2Feuston%3E+%3Fp+%3Fo%7D
See the SPARQL Results Formats section below for more details of the different formats available.
Most languages have simple libraries for URL-encoding strings. This simple example will work in Ruby.
require 'rubygems'
require 'cgi'
query = 'SELECT * WHERE {<http://publishmydata.com/id/transport/uk/transport-for-london/station/euston> ?p ?o}'
encodedquery = CGI::escape(query)
puts encodedquery
SPARQL Results formats
The available formats depend on the type of SPARQL query. A SPARQL query can be one of four main forms: SELECT, CONSTRUCT, DESCRIBE or ASK.
| Query Type | Format | Extension | Accept Header |
|---|---|---|---|
| SELECT | xml | .xml |
application/xml
application/sparql-results+xml |
| json | .json |
application/json
application/sparql-results+json |
|
| text | .text | text/plain | |
| csv | .csv | text/csv | |
| CONSTRUCT | xml | .xml |
application/xml
application/sparql-results+xml |
| turtle | .ttl | text/turtle | |
| ASK | xml | .xml |
application/xml
application/sparql-results+xml |
| json | .json |
application/xml
application/sparql-results+json |
|
| DESCRIBE | Not supported at the moment: we'll add it soon. | ||
Errors
If you make a SPARQL request with a malformed query to any of the formats above (i.e. not via the HTML form at /sparql), then a blank response will be returned, with HTTP status 400.
JSON-P
If you're requesting JSON, you can additionally pass a callback parameter and the results will be wrapped in that function. This is useful for getting round cross-domain issues if you're writing JavaScript. For example:
http://publishmydata.com/sparql.json?callback=myCallbackFunction&query=SELECT+%2A+WHERE+%7B%3Chttp%3A%2F%2Fpublishmydata.com%2Fid%2Ftransport%2Fuk%2Ftransport-for-london%2Fstation%2Feuston%3E+%3Fp+%3Fo%7D
or to make a JSONP request with jQuery, you can omit the callback parameter from the url and just set the dataType to jsonp.
queryUrl = 'http://publishmydata.com/sparql.json?query=SELECT+%2A+WHERE+%7B%3Chttp%3A%2F%2Fpublishmydata.com%2Fid%2Ftransport%2Fuk%2Ftransport-for-london%2Fstation%2Feuston%3E+%3Fp+%3Fo%7D';
$.ajax({
dataType: 'jsonp',
url: queryUrl,
success: function(data) {
// callback code here.
}
});
Paging
The results of SELECT queries through PublishMyData SPARQL endpoints are paged. We take this approach to make sure that queries respond quickly and to avoid queries with very large result sets putting undue load on the server. The maximum number of results per page is 1000. The default for machine-readable formats is 1000 results per page, and for HTML format results is 20 per page. (We are still experimenting with this feature and would welcome your feedback on it.)
There are two parameters that can be added to the URLs described above to control paging. These are:
-
_per_page(defaults to 20, maximum 100) -
_page(defaults to 1)
For example, this query returns all resources of type TubeStation
SELECT * WHERE {?s a }
To get results 101 to 200 in text format, use this URL:
http://publishmydata.com/sparql.text?_page=2&_per_page=100&query=SELECT+%2A+WHERE+%7B%3Fs+a+%3Chttp%3A%2F%2Fpublishmydata.com%2Fdef%2Ftransport%2Ftube%23TubeStation%3E%7D
The results of CONSTRUCT queries are currently limited to 1000 triples and we don't do automatic paging. If you have a CONSTRUCT query with a bigger result than that, you'll need to do your own paging using the SPARQL 'OFFSET' and 'LIMIT' keywords.
This sample Ruby code will loop through all pages of the results of a query and combine them into a single array
require 'rubygems'
require 'cgi'
require 'rest-client'
require 'json'
# find all resources of type TubeStation
query = 'SELECT * WHERE {?s a <http://publishmydata.com/def/transport/tube#TubeStation>}'
encodedquery = CGI::escape(query)
# results per page
per_page = 100
base_url = 'http://publishmydata.com/sparql.json?query=' + encodedquery
# the final result is an array of hashes
# each element of the array looks like:
# {'s'=>{'value'=>'http://publishmydata.com/id/transport/uk/transport-for-london/station/balham', 'type'=>'uri'}}
result = [] # we add the results into this array, page by page
done = false
page = 1
while (!done)
query_url = base_url + "&_page=#{page}&_per_page=#{per_page}"
part_result = JSON.parse(RestClient.get(query_url))
part_result_array = part_result['results']['bindings'] # this reflects the hash structure of the JSON returned
if (part_result_array.length > 0)
result = result + part_result_array
page += 1
else
done = true
end
end
puts 'total number of results = ' + result.length.to_s
Use of named graphs
Each dataset in PublishMyData-driven sites is contained within a separate named graph. The dataset itself has a URI, for example
http://publishmydata.com/id/dataset/transport/uk/transport-for-london
The web page for the dataset lists the named graph that contains the dataset, in this case
http://publishmydata.com/id/graph/transport/uk/transport-for-london
The graph name for the dataset is contained in the dataset metadata, using a predicate called http://publishmydata.com/def/dataset#graph and can be obtained by a query like this:
SELECT ?graph
WHERE {
<http://publishmydata.com/id/dataset/transport/uk/transport-for-london> <http://publishmydata.com/def/dataset#graph> ?graph
}
The graph URI can then be used to restrict the results of a query to triples contained in that graph, as follows.
SELECT * WHERE {
GRAPH <http://publishmydata.com/id/graph/transport/uk/transport-for-london> {
?s ?p ?o
}
}