Unique Identifiers of Researchers versus Unicity of Identifiers of Researchers

As mentioned in the article “First contact with the tools of the ABES“, for the SemBib project, I started by using my own identifiers for the researchers. Then, I wanted to use identifiers coming from reference sources, starting with the identifiers IDREF of the ABES.

I put my finger in a gear.

Given the difficulties encountered in recovering the IDREF identifiers from all Telecom ParisTech researchers, and having seen that ABES has agreements with VIAF, I looked for what I could do with VIAF. VIAF is a ‘virtual international authority file’ created by a set of national libraries. It manages a set of unique identifiers of people. We have seen in the post above how I retrieved from VIAF information about Telecom ParisTech researchers.

By analyzing the data, I was able to see that the VIAF data came from a variety of sources; step by step, from these sources, I found identifiers of people from:

BNF: the Bibliothèque Nationale de France assigns identifiers to authors and especially authors of scientific publications,
ARK: an identification system also used by the BNF,
SUDOC: it is a catalog produced by the ABES, which manages IDREF identifiers,
ISNI: for “International Standard Name Identifier”, defined by an ISO standard, also used, inter alia, by the BNF (see ISNI and BNF)
ORCID: these identifiers concern authors and contributors in the fields of higher education and research; there are imperfect links between ISNI and ORCID (see Relationship between ORCID and ISNI);
RERO: appears to be defined by the Network of the Libraries of Western Switzerland;
LC: identifiers used by the Library of Congress;
KRNLK: The Linked Open Data Access Point of the National Library of Korea, which includes SPARQL access (http://lod.nl.go.kr/home/sparql/se.jsp)
ICCU: used by the Central Institute for a Unified Catalog of Italian Libraries (ICCU)
LNB: identifiers in the National Library of Latvia
NKC: identifiers of the Czech National Library
NLI: identifiers of the National Library of Israel
NLP: identifiers of the National Library of Poland
NSK: identifiers of the university library of Zagreb;
NUKAT: comes from the NUKAT Center of the University of Warsaw
SELIBR: used by LIBRIS, a research service that provides information on titles held by Swedish universities and research libraries (example: http://libris.kb.se/auth/264078);
WKD: concerns the identifiers used by WikiData;
BLSA: probably originated from the British Library
NTA: identifiers of the Royal Library of the Netherlands
I probably forgot some others…

These identifiers all designate a researcher uniquely in an identification system. But, as we have just seen, there can be many identifiers for the same researcher; not all researchers have all the identifiers, but they often have several.

For example, Antonio Casilli is identified at least by:

BNF, ARK, ISNI, VIAF, LC, SUDOC, ORCID, DNB | 1012066622, NUKAT | n 2016165182

The researchers thus have several more or less equivalent identifiers that can be useful to know in a Linked Open Data approach: if one wants to be able to link the data on a researcher, one must already be able to link their unique identifiers! I will come back in a future post about the decentralized solution that I propose.

Note: it makes me think of the joke on video standards “there are N different standards , it is too much; to finish, I will make a unique format that will bring together the best of each standard”; after such work, we do not have 1 standard, but N + 1 standards …

