Paris Musées and Wikidata: establishing links

As of 6/1/2019, my list of establishments attached to Paris Museums includes 14 museums with 16 denominations (see at the end of this post). It was built by hand from the web site de Paris Musées.

I have established several methods for finding links between these museums and Wikidata entities representing them.

The simplest gives 15 links. It searches for a link from the museum name using the search feature provided by Wikidata’s WDQS service. My method does not retain any results if the search yields more than one result and there is uncertainty about the correct answer. Thus ‘The Catacombs’ is not found by this method because the WDQS search yields two results.

By adding as criterion the city of the museum, Paris, and the fact that it is in France I get 8 entities.

Adding as a criterion to the simple method the fact that the entity must be a museum,  I get 14 entities. There are two entities missing. The ‘Palais Galliera’ is an instance of ‘palace’ which is not a type derived from ‘museum’. The ‘Petit Palais’ is of the ‘museum building’ type which is a derivative of ‘building’, but no ‘museum’.

I get 12 entities adding the fact that the museum is in France.

If I combine the monuments and city criteria with the simple method, I get 8 entities.

This makes us 5 methods.

Entities without wikidata link

No denominations of the museums of Paris Musées do not obtain wikidata link by any of the proposed methods.

Entity with non-homogeneous wikidata links

Only one name – Musée Zadkine – obtained different links according to the methods used. There is indeed a Zadkine museum in Arques and one in Paris. The methods get one or the other of the museums. The geographical criterion makes it possible to obtain the right link.

We must now verify that the answers selected are accurate for all cases.

Check of the 16 entities obtained for the 16 denominations

Evaluation

The evaluation is as follows:

  • results found: 16
  • desired results: 16
  • exact results: 16
  • inaccurate results: 0

Which in terms of precision and recall, gives us:

precision = number of exact results / number of results found = 16/16 = 100%

recall = number of exact links found / number of findable links = 16/16 = 100%

And so an F-Measure of:

f-measure = 2 * 1 * 1 / (1 + 1) = 1 = 100%

On this small series of denominations, the proposed methods prove completely satisfactory. We will soon publish the results of these methods on other datasets and technical details on the methods used.

This entry was posted in Public data, Semantic taging, SPARQL. Bookmark the permalink.