Spinque is a small and energetic company with a revolutionary product. The setting is quite academic with lots of connections with universities and research institutes. We're all enthousiastic about database technology and in general the field of information retrieval. Spinque is situated in the old factory warehouse Hooghiemstra at the edge of Utrecht's city center. Among the fun things at Spinque is the large variety of clients, from cultural heritage institutes to beer breweries to universities, which makes it a lot of fun to work.
Besides the open positions below, we are always looking for people that share our passion. So please contact us so that we can have a cup of coffee at our office, or send your resume to firstname.lastname@example.org.
When searching over small collections, say documents that belong to one organization, it can be difficult to create a stable ranking that depends on term distributions of those collections. There are efforts in open web search to release partial indexes of the web, through a common index file format (CIFF), that can be used to develop search systems. As these indexes are created from larger collections, the distribution of terms in these indexes is more stable, and they might provide additional useful information when they are integrated in search solutions for smaller collections.
How this data should be integrated is not clear however, smaller collections have more skewed distributions, more the frequency of certain terms is and should be different than that of a larger web collection. We are interested if we can integrate larger collections, and by doing this increase the effectiveness of our search solutions.
The assignment is to investigate how data from external indexes can be integrated in existing search solutions, and increase the effectiveness of the ranking methods in these solutions. We are particularly interested to do this in the context of raadzaam, a search engine developed by Spinque for council information for municipalities in the Netherlands. To achieve this, we have defined the following assignments: