Getting started with Querqy

Querqy is a query rewriting framework for Java-based search engines. It is probably best-known for its rule-based query rewriting, which applies synonyms, query dependent filters, boostings and demoting of documents. This rule-based query rewriting is implemented in the ‘Common Rules Rewriter’, but Querqy’s capabilities go far beyond this rewriter.

You might want to …

Installation

Installation under Elasticsearch

  • Stop Elasticsearch if it is running.

  • Open a shell and cd into your Elasticsearch directory.

  • Run Elasticsearch’s plugin install script:

./bin/elasticsearch-plugin install <URL>

Select your version below and we will generate the install command for you:



./bin/elasticsearch-plugin install \
  "https://repo1.maven.org/maven2/org/querqy/querqy-elasticsearch/1.7.es892.0/querqy-elasticsearch-1.7.es892.0.zip"
  • Answer yes to the security related questions (Querqy needs special permissions to load query rewriters dynamically).

  • When you start Elasticsearch, you should see an INFO log message loaded plugin [querqy].

Making queries using Querqy

Querqy defines its own query builder which can be executed with a rich set of parameters. We will walk through these parameters step by step, starting with a minimal query, which does not use any rewriter, then adding a ‘Common Rules Rewriter’ and finally explaining the full set of parameters, many of them not related to query rewriting but to search relevance tuning in general.

Minimal Query

POST /myindex/_search

 1{
 2    "query": {
 3        "querqy": {
 4            "matching_query": {
 5                "query": "notebook"
 6            },
 7            "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"]
 8        }
 9    }
10}

Querqy provides a new query builder, querqy (line #3), that can be used in a query just like any other Elasticsearch query type. The matching_query (#4) defines the query for which documents will be matched and retrieved.

The matching query is different from boosting queries which would only influence the ranking but not the matching. We will later see that Querqy allows to specify information for boosting outside the matching_query object and that the set of matching documents can be changed in query rewriting, for example, by adding synonyms or by deleting query tokens.

The query element (#5) contains the query string. In most cases this is just the query string as it was typed into the search box by the user.

The list of query_fields (#7) specifies in which fields to search. A field name can have an optional field weight. In the example, the field weight for title is 3.0. The default field weight is 1.0. Field weights must be positive. We will later see that the query_fields can be applied to parts of the querqy query other than the matching_query as well. That’s why the query_fields list is not a child element of the matching_query.

The combination of a query string with a list of fields and field weights resembles Elasticsearch’s built-in multi_match query. We will later see that there are some differences in matching and scoring.

Querqy inside the known Elasticsearch Query DSL

The following example shows, how easy it is to replace a Elasticsearch query type like multi_match with a Querqy matching_query, so you can profit from Querqy’s rewriters. Let’s say you have an index that contains forum posts and want to find a certain post in the topic “hobby”, that was made 10-12 days ago and was about “fishing”.

A simple Boolean query with a multi_match and a match query inside the must occurrence and a range query in the filter occurrence should do the trick.

POST /index/_search

 1  {
 2    "query": {
 3      "bool": {
 4        "must": [
 5          {
 6            "match": {
 7              "topic": "hobby"
 8            }
 9          },
10          {
11            "multi_match": {
12              "query": "fishing",
13              "fields": ["title", "content"]
14            }
15          }
16        ],
17        "filter": [
18          {
19            "range": {
20              "dateField": {
21                "gte": "now-12d",
22                "lte": "now-10d"
23              }
24            }
25          }
26        ]
27      }
28    }
29  }

To use the matching_query from the querqy query builder, your request would look like this:

POST /myindex/_search

 1  {
 2    "query": {
 3      "bool": {
 4        "must": [
 5          {
 6            "match": {
 7              "topic": "hobby"
 8            }
 9          },
10          {
11            "querqy": {
12              "matching_query": {
13                "query": "fishing"
14              },
15              "query_fields": ["title", "content"],
16              "rewriters": ["my_replace_rewriter", "my_common_rules"]
17            }
18          }
19        ],
20        "filter": [
21          {
22            "range": {
23              "dateField": {
24                "gte": "now-12d",
25                "lte": "now-10d"
26              }
27            }
28          }
29        ]
30      }
31    }
32  }

As you can see, to use a matching_query instead of a multi_match you need to use querqy (line #11) as a “wrapper” for the matching_query.

Where to go next