Posted on 14/11/2014 by Michiel Hildebrand
In this post, we explain how we build a prototype application on top of three linked datasets. We detail how the required backend services combine information from multiple sources and how we constructed these services with Spinque. Using Spinque's search by strategy approach, we were able to create all required backend functionality by constructing strategies in the graphical editor; not a single line of code had to be written!
COMSODE turns datasets from Czech governmental institutions into Linked Open Data. One of these datasets is from the Czech Trade Inspection Authority (CTIA). The CTIA monitors and inspects businesses and individuals, who sell goods or provide services on the Czech market. We used this dataset to create a search application for the general public, to help find good restaurants. In this case, we mean with good that the CTIA did not find issues with respect to hygiene, customer friendliness and legitimacy of the business. The goal was to provide functionality comparable to the popular service Yelp, allowing users to search for a restaurant and display the restaurants on a map. In addition, we want to provide functionality to filter the results to a specific geographic area by zooming and panning the map.
The data set from CTIA alone is not sufficient to provide the application functionality, because it does not contain more information about the restaurants than the inspection results themselves. Specifically, it does not contain the businesses' name and address information that we need to display search results. It also does not define at which type of business an inspection is conducted, so we could not limit results to restaurants only. This information is however included in other datasets, and within COMSODE these are made available as Open Data and linked to the CTIA inspections.
The CTIA dataset contains information about the inspections that were conducted, such as the sanctions that were given, and under which acts these were given. The dataset also contains the address at which the inspection was conducted. This dataset also contains a reference to the business entity inspected, using an identifier from the Czech Trade Licensing Register. This registry is available in the ARES system and is also made available as Linked Open Data by the COMSODE project. The ARES dataset contains information about the business such as the name, the address and the activities conducted by the business.
ARES describes legal entities, e.g. the organization owning a restaurant or restaurant chain. The address of this legal entity is not necessarily equal to the actual location of the inspection; it may just refer to the legal entity's office. The general public that we target with the search application is however not interested in the legal entity, but rather a specific restaurant and its address to visit. This information we get by including a third open dataset, the registry of food related entities from the Ministry of Health. Every business in the Czech republic that operates a food or catering service must register at the Ministry of Health. For now, we consider all entities in this data as restaurants. The dataset contains the name and address of each restaurant. It also contains the identifier of the legal business, linking it to the trade registry.
In COMSODE, the inspection data from CTIA, the trade registry from ARES and the restaurant list from the Ministry of Health are each converted and published as Linked Open Data. These three datasets and their links are shown in the diagram below: legal businesses in blue, inspections in green and restaurants in red. Restaurants and inspections are both linked to the legal business entity from ARES, as the Ministry of Health and the CTIA both use the trade registry identifiers.
The original data does however not contain explicit references between the inspections and restaurants; links that are crucial to give insight into the results of inspections conducted at a specific restaurant. COMSODE provides the tools to create such links (more on this topic in a follow up post). In this case, we matched the addresses of the inspections with the addresses of the restaurants that are linked to the legal business. In the diagram, these links are shown by the dotted line. To display the restaurants on a map, and allow filtering by a geographic area, we need their geographic locations. Again COMSODE provides tools to get these, using a third party geocoding API, such as provided by Google Maps or MapQuest.
Schematic graph of Linked Datasets used for restaurant search application
Given these three linked datasets, Spinque search strategies power the Web Application. As shown in the screenshot below, this application consists of three parts: a search field and filters (top), the list of restaurants (left), and a map showing the locations of the restaurants (right).
The user can quickly find restaurants by name, or search for restaurants in a street, city or area. The displayed results are color coded based on the inspection information from the CTIA: green for restaurants that are satisfactory, orange when there were issues found in the last inspection and the restaurant needs to improve, and grey if the status is unknown because no inspections were conducted.
The checkboxes (shown below the search field) correspond to these categories and enable filtering of the results. It shows the location of the restaurants. In addition, it also serves as a query mechanism. The results are restricted to the area that is visible on the map. By zooming or panning the map the result list is changed. Selecting a result from the map shows the details of the restaurant in the result list, including the inspections that were conducted and the issues that were found. Visa versa hovering over a restaurant from the list highlights the location on the map.
Screenshot of restaurant search application
The frontend of the application uses straightforward Web technologies, nothing exciting for most Web developers. The exciting part is at the backend. The application requires various data services: autocompletion on the names of the restaurants, full text search on restaurants, filtering by inspection category and filtering by geographical area.
The exciting part: we created all this backend functionality with Spinque strategies!
Creating the backend functionality for the application required no coding. All services were created with the Spinque strategy editor and published as APIs, all from within a Web browser.
Within COMSODE, we make the Spinque software suite compatible with Linked Data. We used the first version of Spinque Linked Data to create the backend functionality for the restaurant application. The starting point is an instance of Spinque that contains the COMSODE datasets, imported from an Open Data Node (ODN).
Using the Spinque editor (shown in the screenshot below), we can create all the functionality required for our restaurant search application. The interface consists of three main parts. On the left side the library with building blocks, categorized by function. In the middle, you see the strategy canvas with a graphical representation of the search strategy. The right side of the interface contains tools to compile the strategy and preview the results.
Screenshot of the Spinque strategy editor
The user creates a strategy by dragging building blocks from the library on to the strategy canvas, and connect them by dragging lines between the input and output connectors. To make a search strategy available as search engine, it is compiled and published as an API, using the management interface (not shown). Once a strategy is published, it is immediately available through the Spinque REST API.
Search strategies specify how search takes place. This is not restricted to 'just' ranking the restaurants that we create the application for - all 'intelligence' in the web application can be expressed through search strategies.
Take for example the strategy that provides autocompletion suggestions for restaurant names, depicted on the strategy canvas in the screenshot above. It consists of three building blocks. The first block, labeled genericIndex, defines the source. In this case the index containing all datasets. The second block filters this index to restaurants, by restricting to resources with the RDF type http://schema.org/FoodEstablishment. The last block is configured to take an input parameter, the query, and filter the restaurants on those that have a matching name. When compiled and published this simple strategy provides an autocompletion API.
We extended this simple strategy to rank the restaurants by their distance to the current location of the user. In the current Web application the location of the user is the center of the map. In a mobile application the GPS coordinates of the user could be used for this purpose. The figure below shows the extended strategy. The first three building blocks are the same. The next block, TraverseRelation, gets the address of the restaurant by following the http://schema.org/address relation in the graph. This address is then ranked by the distance to a geographical point. This is done in the building block rankOnPointDistance, which takes as an input parameter the latitude and longitude coordinates. In the last block we traverse the graph back from the ranked addresses to the restaurants. Published as an API this strategy provides an autocompletion service that returns restaurants nearby the user first.
Spinque search strategy for autocompletion functionality ranking restaurants by distance to the user's location
The restaurants shown in the result list (and on the map) are also the output of a strategy. This strategy takes as input a set of keywords and the bounds of the map, and generates as output the list of restaurants contained within the bounds of the map that match the keywords. Ranking the results considers both the user's keywords and their CTIA inspection status: we consider restaurants to be good if they are frequently inspected and no issues are found during these inspections.
The Figure below shows the part of the strategy that computes the inspection related 'prior' weighting for all restaurants. Again, it starts with the full database, filtered to include only restaurants. It ranks these by the number of inspections that were conducted at them, rankOnIntDistance (restaurants with more inspections are better). It re-ranks these restaurants by the number of inspections that led to results (sanctions, bans, confiscations). In this case restaurants for which no or little issues are reported are the better ones.
Excerpt of Spinque search strategy for restaurants that gives a prior ranking
Next, the ranked list of restaurants is filtered by location, including only the restaurants that are within the area visible on the map. Finally, the ranked and filtered list of restaurants is the input for a keyword search algorithm. This part of the strategy is shown below.
The user input is taken by the keywords block at the top. These keywords are stemmed using a Czech stemmer and then used as input for the block rank_text_BM25 that applies the BM25 search algorithm. The strategy includes two BM25 blocks, the block on the left takes as input the addresses of the restaurants, whereas the block on the right searches on the names. The addresses are compound objects with separate attributes for the street name, postalcode, city, region etc. The algorithm searches all these address attributes and then traverses back in the graph from the address to the restaurant, TraverseRelationBackward. Finally the results of the two searches are merged, providing a search algorithm to find restaurants by name and address. The final search strategy that we developed mixes these results with those where the user's query terms are first stemmed for the English language.
Excerpt of Spinque strategy to find restaurants by name and address
To summarize, the strategy takes as input a search query and the bounds of the map and returns a list of restaurants that are ranked by status of the inspections in combination with the text based ranking. The API from this published strategy is used to get the restaurants that are shown in the result list and on the map.
In this post, we walked through the definition of a restaurant search application over three integrated open data sets. The frontend of this application was build with standard Web technologies (under 500 lines of Javascript). All the required backend services were created with Spinque (0 lines of Javascript!).
We used the exact same Spinque instance to create the backend functionality for a completely different search application, focusing on legal entities in the Czech Republic. In other words, developing web applications with open data turns into clicking together the API functions you need; different applications can share the same Spinque index.
Within COMSODE we are now working hard to make Spinque available for third party developers. This will enable application developers to build the custom APIs they need to power novel applications on top of Linked Open Data. If you are interested in trying this out for yourself, do not hesitate to contact us.