CORS, semantic web and linked data

In this post, I talk about CORS and solutions to use data from a server different than the web page which use it.

The development of the semantic web and linked data certainly use through the development of websites that operate data made available in the context of semantic web technologies. The best known example is the use of DBPedia to complement the content of a web page.

But use DBPedia from a Web page implies to send a request to a SPARQL access point (en, fr) from the page, then use the data obtained to enrich the page. Obviously this requires to bypass the security rule which ban the use in a web page a content from a server other than the server where the page is served (unless it’s a particular content as a javascript code or jpeg image): the CORS principle. I will call later the server which serves the web page a ‘source server’, and the one where we try to recover data, the ‘alien server. ‘

I read a lot about CORS before I understood an essential thing: to bypass CORS, the server that provides the data must implement specific solutions; Therefore, you can not use any source but a source willing to cooperate with you. We will see later that one of the solutions allows to exploit any source, but through your server which becomes the data source for the web page.

Three solutions are known to query an alien server:

  • the declaration of the source server on the alien server as authorized requester,
  • sending data in the form of JavaScript code (JSONP method)
  • routing data from the alien server to the web page via the source server (proxy)

In the first solution, the source server has obtained from alien server that it registers  the source server in a list of authorized requester.

After this registering, a web page that comes from the source server will be able to send requests to the foreign server and obtains data.

So there is a strong prerequisite: the heads of the two servers must be connected and the data server (alien) should have referenced the other server (source). This is probably the best solution in terms of safety, but it is very restrictive. In particular, there is a big uncertainty and delay between when you identify a data source that you want to use and when you can use that source (if it allows you).

The way to do this recording depends completely on the nature of the alien server. For example, I administer shadok, a Virtuoso server (shadok.enst.fr/sparql) and I was able to declare a server (givingsense.eu) that will host pages that will make SPARQL queries on the shadok server. To this, I added http://givingsense.eu in the server list accepted by shadok, following the instructions found here.

A basic example of usage is visible by following this link:

http://givingsense.eu/testapp/testcors.html

JSONP method

The idea here is simple: because the pages can load javascript, send them javascript. The data server (alien) will send a function call that receives the data as a parameter. The source page must set the execution function code. This solution is known as JSONP.

The page contains for example the function definition

function myjsonp(data) { // défini la fonction callback dont le nom est passé en paramètre de la requête
queryAnswer = data;
}

The following URL, which can be recovered by making a test directly with the user interface at dbpedia.org/sparql

http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3FConcept+where+%7B%5B%5D+a+%3FConcept%7D+LIMIT+5&format=application%2Fsparql-results%2Bjson

gives as result

{ "head": { "link": [], "vars": ["Concept"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "Concept": { "type": "uri", "value": "http://dbpedia.org/ontology/Image" }},
    { "Concept": { "type": "uri", "value": "http://www.w3.org/2002/07/owl#Thing" }},
    { "Concept": { "type": "uri", "value": "http://xmlns.com/foaf/0.1/Person" }},
    { "Concept": { "type": "uri", "value": "http://dbpedia.org/ontology/Person" }},
    { "Concept": { "type": "uri", "value": "http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent" }} ] } }

But if you make this request in a web page, it will fail because it will look for data on a server different from the origin of the page.

Adding to the above query the following parameter

&callback=myjsonp

that defines the callback parameter and assigns the value myjsonp (or any name you have given to your callback function). The Virtuoso server that hosts DBPedia, will use this setting to encapsulate the answer in a call to myjsonp function.

This query can be executed if it is interpreted by the browser as JavaScript code loading:

<script  type=”text/javascript” src=”…here the above request…”></script>

Thus, in interpreting this line, the browser receives from the alien server a javascript function call and it execute the call; such as our function is defined, the result of the sparql request will be stored in the global variable queryAnswer (with our very simple myjsonp sample function).

The alien server Virtuoso– -here is the implementation of a specific request treatment, instead of returning the data, sends the data encapsulated in a call fonction.demande data.

Thus, for DBPedia, if you ask to have a JSON response type and add the callback parameter, you get the JSONP. For the above query, the received result is:

myjsonp(

{ "head": { "link": [], "vars": ["Concept"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "Concept": { "type": "uri", "value": "http://dbpedia.org/ontology/Image" }},
    { "Concept": { "type": "uri", "value": "http://www.w3.org/2002/07/owl#Thing" }},
    { "Concept": { "type": "uri", "value": "http://xmlns.com/foaf/0.1/Person" }},
    { "Concept": { "type": "uri", "value": "http://dbpedia.org/ontology/Person" }},
    { "Concept": { "type": "uri", "value": "http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent" }} ] } })

We see that it’s the same as before surrounded by myjsonp (…).

An example is shown here:

http://givingsense.eu/testapp/dbpediaJsonp.html

For dynamic behavior, it is necessary produce some javascript code to create the script tag and inject it into the page.

Proxy

The last solution that I will present is the use of a proxy in the source server.

Because the server that sends data must trust the page that asks them, one solution is to request these data to the server that provided the page.

In this case, the page sends the request-for example, those defined in the previous section – implemented as a parameter of a service on the source server. The service executed in the source server retrieves this parameter, executes the request to the alien server, receives the result and returns it to the page.

Although this solution has drawbacks:

  • it adds a processing load on the source server,
  • it requires a transfer of data from the alien server to the source server, then from the source server to the page that has sent the request,
  • it adds latency.

But it also has advantages:

  • it can query all data sources, at the cost of implementation of something like a proxy-cache,
  • it reduces the dependence of operation against the alien server hazards, and even in this case, the latency can be reduced (in the example of the above application, the sparql code would run once on the foreign server, and the result is stored on the source server; subsequent executions of the same query would get the result directly from the cache on the source server).

I will give elements in a future post for the implementation of this solution.

Pour aller plus loin

A more detailed presentation of CORS calls here

http://www.html5rocks.com/en/tutorials/cors/

JSONP support by the Ajax calls offered by jQuery

http://api.jquery.com/jQuery.ajax/#jQuery-ajax-settings

where it may be noted in particular that jQuery converts queries of json data in jsonp queries when the query is for a different origin than the current page.

Some supplements on the details of CORS:

http://margaine.com/2014/06/28/jsonp-vs-cors.html

Conclusion

You will probably need the solution via the proxy for certain data sources that offer none of the previous two solutions, so it could be a good idea to support the proxy solution in your server now. We will see how to do it in a future post.

You will see that more and more sources offer you the JSONP solution; it is quite easy to implement and probably give more dynamicity to your pages.

Finally, declarative solution is limited to data sources on which you can intervene to setup (or someone can do it for you): this will surely be the least common case. If it is possible, remember that it is still the safest.

[wl_chord]

[wl_navigator]

[wl_faceted_search]

This entry was posted in SPARQL and tagged , , . Bookmark the permalink.