Getting Started with SPARQL Access Point from the Springer Editor

As part of the SemBib project, I will discover with you the SPARQL public access point of the Springer scientific editor at http://lod.springer.com/sparql-form/index.html. For a first contact, we must get acquainted and some classic requests will help us.

First, discover the properties used:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX spr: <http://lod.springer.com/data/ontology/property/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
select distinct ?p ?label where {
?s ?p ?o .
OPTIONAL {
?p rdfs:label ?label .
}
}
limit 100

which takes about 23 seconds (I have somewhat cheated: I executed it a first time to see the URIs used and deduce prefixes to define and have the result more compact below, thanks to the prefixes).

She gives :

--------------------------------------------------------------
| p                           | label                        |
==============================================================
| rdf:type                    |                              |
| dc:creator                  |                              |
| dc:date                     |                              |
| dc:description              |                              |
| dc:publisher                |                              |
| dc:rights                   |                              |
| rdfs:label                  |                              |
| rdfs:domain                 |                              |
| rdfs:range                  |                              |
| rdfs:subPropertyOf          |                              |
| spr:hasDBLPID               | "has DBLP ID"@en             |
| spr:confSeriesName          | "conference series name"@en  |
| spr:confYear                | "conference year"@en         |
| spr:confAcronym             | "conference acronym"@en      |
| spr:confName                | "conference name"@en         |
| spr:confCity                | "conference city"@en         |
| spr:confCountry             | "conference country"@en      |
| spr:confStartDate           | "conference start date"@en   |
| spr:confEndDate             | "conference end date"@en     |
| spr:hasSeries               | "ConferenceSeries"@en        |
| spr:volumeNumber            | "Volume number"@en           |
| spr:title                   | "Title"@en                   |
| spr:subtitle                | "Subtitle"@en                |
| spr:ISBN                    | "ISBN"@en                    |
| spr:EISBN                   | "eISBN"@en                   |
| spr:bookSeriesAcronym       | "book series acronym"@en     |
| spr:hasConference           | "Conference"@en              |
| spr:bookDOI                 | "DOI"@en                     |
| spr:isIndexedByScopus       | "Is indexed by Scopus"@en    |
| spr:scopusSearchDate        | "Scopus search date"@en      |
| spr:isAvailableAt           | "Available at"@en            |
| spr:isIndexedByCompendex    | "Is indexed by Compendex"@en |
| spr:compendexSearchDate     | "Compendex search date"@en   |
| spr:confNumber              | "conference number"@en       |
| spr:copyrightYear           | "Copyright year"@en          |
| spr:firstPage               | "First page"@en              |
| spr:lastPage                | "Last page"@en               |
| spr:chapterRegistrationDate | "Registration date"@en       |
| spr:chapterOf               | "Book"@en                    |
| spr:chapterOnlineDate       | "Online date"@en             |
| spr:copyrightHolder         | "Copyright Holder"@en        |
| spr:metadataRights          | "Metadata Rights"@en         |
| spr:abstractRights          | "Abstract Rights"@en         |
| spr:bibliographyRights      | "Bibliography Rights"@en     |
| spr:bodyHtmlRights          | "Body HTML Rights"@en        |
| spr:bodyPdfRights           | "Body PDF Rights"@en         |
| spr:esmRights               | "ESM Rights"@en              |
--------------------------------------------------------------

On the one hand, it shows us that the performances are not exceptional. On the other hand, we see that for the most part Springer defined his own ontology: data are accessible, but not really connected to the rest of the world by shared concepts. Data is defined with 47 predicates (properties).

The following query – without repeating the prefixes defined above – gives us the number of distinct ‘subjects’ filled in the database: 451277.

 

select (count(distinct ?s) as ?size) where {
?s ?p ?o .
}

and that which follows gives the number of triples which inform these subjects: 3490865, that is to say about 8 predicates per different subject, which is little to give great details about bibliographic references. It can be assumed that there will be little data on each reference.

select (count(?s) as ?size) where {
?s ?p ?o .
}

The relatively small number of predicates per subject suggests me to search for the most used:

select ?p (count(?p) as ?freq) where {
?s ?p ?o .
}
group by ?p
order by desc(?freq)

giving (by removing the least used, essentially relating to rights issues):

----------------------------------------
| p                           | freq   |
========================================
| rdf:type                    | 451316 |
| spr:bookDOI                 | 441266 |
| spr:title                   | 439864 |
| spr:chapterOf               | 381656 |
| spr:firstPage               | 381646 |
| spr:lastPage                | 381646 |
| spr:chapterRegistrationDate | 245143 |
| spr:chapterOnlineDate       | 188209 |
| spr:EISBN                   | 59611  |
| spr:isIndexedByScopus       | 59370  |
| spr:scopusSearchDate        | 59370  |
| spr:ISBN                    | 59101  |
| spr:isAvailableAt           | 59101  |
| spr:copyrightYear           | 55964  |
| spr:subtitle                | 40321  |
| spr:volumeNumber            | 34988  |
| spr:bookSeriesAcronym       | 17665  |
| spr:compendexSearchDate     | 11400  |
| spr:isIndexedByCompendex    | 11400  |
| spr:hasConference           | 9509   |
| spr:confCity                | 8487   |
| spr:confCountry             | 8487   |
| spr:confName                | 8487   |
| spr:hasSeries               | 8487   |
| spr:confEndDate             | 8479   |
| spr:confStartDate           | 8479   |
| spr:confYear                | 8479   |
| spr:confAcronym             | 8233   |
| spr:confNumber              | 8021   |
----------------------------------------

We see that most of the information available on an element of the database consists of: its type, its DOI number, its title, of what the element is a chapter, first page and last page. Other information relates in particular to conferences from which the documents may originate.

Predicates with domain and range

We see that the properties domain – which gives us the category of objects to which the predicate applies- and range – which gives us the category of possible values for this predicate – seem to be filled for some predicates.

With the same small cheating as above for prefixes, the following query gives us in 15 seconds the domain and range used:

 

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX spr: <http://lod.springer.com/data/ontology/property/>               
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xs: <http://www.w3.org/2001/XMLSchema#>
PREFIX sxs: <https://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX spc:<http://lod.springer.com/data/ontology/class/>                       
select distinct ?p ?domain ?range  where {
?s ?p ?o .
?p rdfs:domain ?domain .
?p rdfs:range ?range .
}
limit 100

The result is:

------------------------------------------------------------------------------
| p                           | domain                | range                |
==============================================================================
| spr:confSeriesName          | spc:ConferenceSeries  | rdf:langString       |
| spr:confYear                | spc:Conference        | xs:date              |
| spr:confAcronym             | spc:Conference        | rdf:langString       |
| spr:confName                | spc:Conference        | rdf:langString       |
| spr:confCity                | spc:Conference        | rdf:langString       |
| spr:confCountry             | spc:Conference        | rdf:langString       |
| spr:confStartDate           | spc:Conference        | xs:date              |
| spr:confEndDate             | spc:Conference        | xs:date              |
| spr:hasSeries               | spc:Conference        | spc:ConferenceSeries |
| spr:confNumber              | spc:Conference        | xs:int               |
| spr:volumeNumber            | spc:ProceedingsVolume | rdfs:literal         |
| spr:title                   | spc:ProceedingsVolume | rdf:langString       |
| spr:subtitle                | spc:ProceedingsVolume | rdf:langString       |
| spr:ISBN                    | spc:ProceedingsVolume | rdfs:literal         |
| spr:EISBN                   | spc:ProceedingsVolume | rdfs:literal         |
| spr:bookSeriesAcronym       | spc:ProceedingsVolume | rdfs:literal         |
| spr:hasConference           | spc:ProceedingsVolume | spc:Conference       |
| spr:bookDOI                 | spc:ProceedingsVolume | rdf:langString       |
| spr:isIndexedByScopus       | spc:ProceedingsVolume | sxs:boolean          |
| spr:scopusSearchDate        | spc:ProceedingsVolume | sxs:dateTime         |
| spr:isIndexedByCompendex    | spc:ProceedingsVolume | sxs:boolean          |
| spr:compendexSearchDate     | spc:ProceedingsVolume | sxs:dateTime         |
| spr:volumeNumber            | spc:Book              | rdfs:literal         |
| spr:title                   | spc:Book              | rdf:langString       |
| spr:subtitle                | spc:Book              | rdf:langString       |
| spr:ISBN                    | spc:Book              | rdfs:literal         |
| spr:EISBN                   | spc:Book              | rdfs:literal         |
| spr:bookSeriesAcronym       | spc:Book              | rdfs:literal         |
| spr:bookDOI                 | spc:Book              | rdf:langString       |
| spr:isIndexedByScopus       | spc:Book              | sxs:boolean          |
| spr:scopusSearchDate        | spc:Book              | sxs:dateTime         |
| spr:copyrightYear           | spc:Book              | sxs:date             |
| spr:isIndexedByCompendex    | spc:Book              | sxs:boolean          |
| spr:compendexSearchDate     | spc:Book              | sxs:dateTime         |
| spr:title                   | spc:BookChapter       | rdf:langString       |
| spr:subtitle                | spc:BookChapter       | rdf:langString       |
| spr:bookDOI                 | spc:BookChapter       | rdf:langString       |
| spr:firstPage               | spc:BookChapter       | sxs:int              |
| spr:lastPage                | spc:BookChapter       | sxs:int              |
| spr:chapterRegistrationDate | spc:BookChapter       | sxs:date             |
| spr:chapterOnlineDate       | spc:BookChapter       | sxs:date             |
| spr:chapterOf               | spc:BookChapter       | spc:Book             |
| spr:copyrightYear           | spc:BookChapter       | sxs:date             |
| spr:copyrightHolder         | spc:BookChapter       | rdf:string           |
| spr:metadataRights          | spc:BookChapter       | rdf:string           |
| spr:abstractRights          | spc:BookChapter       | rdf:string           |
| spr:bibliographyRights      | spc:BookChapter       | rdf:string           |
| spr:bodyHtmlRights          | spc:BookChapter       | rdf:string           |
| spr:bodyPdfRights           | spc:BookChapter       | rdf:string           |
| spr:esmRights               | spc:BookChapter       | rdf:string           |
------------------------------------------------------------------------------

Exploring some predicates

dc:creator

I expected a use of dc: creator for author names. But dc: creator takes only one value: “Springer“@en. No doubt: to designate the creator of the base. No other predicate seems to bear the name of the authors.

rdf:type

The following query will allow us to see the distribution of the types used:

select distinct ?o (count(?o) as ?typecount) where {
?s a ?o .
}
group by ?o
order by desc(?typecount)

gives

----------------------------------------------------------------
| o                                                | typecount |
================================================================
| spc:BookChapter                                  | 381657    |
| spc:Book                                         | 50102     |
| spc:ProceedingsVolume                            | 9509      |
| spc:Conference                                   | 8487      |
| spc:ConferenceSeries                             | 1477      |
| rdf:Property                                     | 39        |
| <http://www.w3.org/2002/07/owl#DatatypeProperty> | 34        |
| <http://www.w3.org/2002/07/owl#Class>            | 5         |
| <http://www.w3.org/2002/07/owl#ObjectProperty>   | 5         |
| <http://rdfs.org/ns/void#Dataset>                | 2         |
----------------------------------------------------------------

spr:bookDOI

This predicate probably associates a DOI number with each document. By its nature, a document is uniquely identifies. I will be interested in the form used to record the DOI number by Springer (I noted, for example, that in the Telecom ParisTech database, various forms are used).

select ?doi  where {
?s spr:bookDOI ?doi .
}
limit 5

gives

----------------------------------
| doi                            |
==================================
| "10.1007/978-3-319-09147-1"@en |
| "10.1007/978-3-319-09147-1"@en |
| "10.1007/978-3-319-10762-2"@en |
| "10.1007/978-3-319-10762-2"@en |
| "10.1007/978-3-319-07785-7"@en |
----------------------------------

We see a homogeneous representation of the DOI numbers in the Springer database. I have checked this on a larger number of examples.

Predicates about conferences

Several predicates seem to concern series of conferences. I will look at how many are concerned and what series conferences have had the most occurrences.

There are 1477 topics of type spc: ConferenceSeries (cf above the most frequent types).

The following query will give us the series of conferences which have given rise to more publication by Springer:

 

select ?name (count(distinct ?conf) as ?c) where {
?conf a spc:Conference  .
?conf spr:hasSeries ?serie  .
?serie a spc:ConferenceSeries  .
?serie spr:confSeriesName ?name
}
group by ?name
order by desc(?c)
limit 20

gives

--------------------------------------------------------------------------------------------------------
| name                                                                                            | c  |
========================================================================================================
| "International Colloquium on Automata, Languages, and Programming"@en                           | 40 |
| "International Symposium on Mathematical Foundations of Computer Science"@en                    | 38 |
| "Annual Cryptology Conference"@en                                                               | 33 |
| "Annual International Conference on the Theory and Applications of Cryptographic Techniques"@en | 33 |
| "International Conference on Applications and Theory of Petri Nets and Concurrency"@en          | 32 |
| "International Workshop on Graph-Theoretic Concepts in Computer Science"@en                     | 32 |
| "International Symposium on Distributed Computing"@en                                           | 29 |
| "International Conference on Computer Aided Verification"@en                                    | 28 |
| "International Conference on Concurrency"@en                                                    | 28 |
| "International Conference on Information Security and Cryptology"@en                            | 28 |
| "International Conference on Advanced Information Systems Engineering"@en                       | 27 |
| "International Semantic Web Conference"@en                                                      | 27 |
| "Ada-Europe International Conference on Reliable Software Technologies"@en                      | 26 |
| "European Conference on Object-Oriented Programming"@en                                         | 26 |
| "European Symposium on Programming Languages and Systems"@en                                    | 25 |
| "International Conference on Conceptual Modeling"@en                                            | 25 |
| "International Workshop on Languages and Compilers for Parallel Computing"@en                   | 25 |
| "Annual Symposium on Combinatorial Pattern Matching"@en                                         | 24 |
| "Annual Symposium on Theoretical Aspects of Computer Science"@en                                | 24 |
| "International Conference on Algorithmic Learning Theory"@en                                    | 24 |
--------------------------------------------------------------------------------------------------------

This probably gives an overview of the themes most pubished by Springer.

Update frequency of the database

Scientific papers are published each month.

To get an idea of the freshness of the data available here, I make a first test on a book to which I contributed – “Multimodal Interaction with W3C Standards” – whose DOI is: 10.1007 / 978-3-319-42816 -1. It is not in the database on 3/12/2016.

Some predicates suggest date information. I will look for the most recent date in the database. I will use the predicate of this most frequent type: spr: chapterRegistrationDate, which gives dates of the form

 "2002-01-01"^^xs:date

The request

select distinct ?date where {
?s spr:chapterRegistrationDate ?date  .
}
order by desc(?date)
limit 5

gives the following surprising result

-------------------------
| date                  |
=========================
| "2017-09-09"^^xs:date |
| "2017-07-25"^^xs:date |
| "2017-06-14"^^xs:date |
| "2017-05-20"^^xs:date |
| "2016-12-19"^^xs:date |
-------------------------

The last recorded document was in the future!?!

In any case, this suggests that this database is regularly updated – even if the posted dates are to be interpreted in a way I do not know at the moment.

Conclusion

This exploration confirms what I have been intuitive about since the beginning of the SemBib project: there are more and more sources of bibliographic data, but each has its own objectives and is incomplete for other purposes, such as Sembib.

This also confirms the axis chosen for Sembib: to constitute a graph of data specific to SemBib, but interconnected with other graphs. SemBib advocates a federation of interconnected bibliographic graphs.

This entry was posted in Public data, Semantic taging, SPARQL. Bookmark the permalink.