Open Data, Open Art

Posted on 29/07/2014 by Michiel Hildebrand

At Spinque we like Open Data. We also like art. What better to combine them? In this post we illustrate how to use open data to enrich access to the art collection of the Rijksmuseum.

Het Melkmeisje by Johannes Vermeer

The Rijksmuseum in Amsterdam is one of the most famous museums in the world. The amazing collection of the museum is publicly available. You can search the collection at the Rijksmuseum Website or fetch the the collection metadata. The data contains artworks with descriptions (or annotations) provided by the cataloguers of the museum. Like many museums and other cultural heritage institutions these annotations cover the basic object characteristics such as the creator, date and material. In addition, cataloguers have described what is depicted on the artworks, the subject matter.

We did a little project at Spinque to explore different strategies to search in the Rijksmuseum collection. We started with an RDF representation of the artwork collection and the thesauri that are used to described the artworks. We first demonstrate how to make a basic search engine on top of this data. Next we integrate additional Open Data sets to enrich the search experience. We improve the ranking, enable multilingual search, and provide recommendation of related artists and artworks.

Searching RDF

Building a basic search engine for the Rijksmuseum collection is straightforward. If you want to know how you do this with Spinque's search by strategy technology take a look at the screencast. We started out with this basic search strategy on the Rijksmuseum collection. It turns out this does not work very well. When searching for the famous painting from Vermeer using the title 'melkmeisje' the ranking of the results is not very useful. Why do we not get the famous masterpiece by Vermeer as the first result? The problem is that the basic search strategy only uses text statistics according to the BM25 retrieval model. In this collection the textual features only do not predict if an artwork is famous or not. Using the facets to filter on the creator Johannes Vermeer we can solve the problem, but the point is that the public will expect the masterpieces on top.

In this case there is a simple solution. The Rijksmuseum has explicitly annotated the famous artworks. With Spinque we can include this information into the search strategy. We extend the basic strategy by boosting the prior score of the famous artworks. Now when search for 'melkmeisje' the famous painting from Vermeer is the first result.

Including DBPedia

The terms that are used by the cataloguers to annotate the artworks are maintained by the museum in their internal vocabularies (or thesauri). The vocabularies include names of artists and historical persons, geographical locations, art-specific concepts such as materials and content-specific concepts such as historical events. While some of these vocabularies are quite large, the information about the entities and concepts is rather sparse. For example, the materials are only available in Dutch, and for most artists only the basic biographical information is known.

The Web contains several open data sources that can complement this information. Wikipedia (or DBPedia) contains information about well known artists such as Vermeer, including relations to other artists. Other relevant sources are the Art & Architecture Thesaurus from the Getty Institute.

To make this open data available when searching the Rijksmuseum collection we first need to integrate it. One approach is to follow the principles of Linked Data and create links between the objects in the Rijksmuseum collection and DBPedia. If you are interested in this approach have a look at Amalgame and the SILK link discovery framework.

When we have links between the Rijksmuseum data and DBPedia we can prioritize the artworks that are found there (thus that are described on Wikipedia). This is an alternative (or maybe a complementary) approach to get the masterpieces on top. DBPedia also contains the title of the artwork in other languages. Now you can also search in English 'milkmaid', in Spanish 'La Lechera', in Polish 'Mleczarka' and other languages. Quite handy for a museum with an international audience.