How to Setup Search Suggestions In Solr

by Nadeem Aslam

The SuggestComponent in Solr provides users with automatic suggestions for query terms. If you need to provide search term suggestions based on characters that the user has typed into the search box, you can use Solr.

How to Setup:

Define the schema for the index:
In order to create smaller documents I trimmed the fields down to the bare minimums. This is done in schema.xml.

<fields>
<field name="_content" type="text_general" indexed="true" stored="false" />
<field name="_database" type="string" indexed="true" stored="true" />
<field name="_uniqueid" type="string" indexed="true" stored="true" required="true" />
<field name="_name" type="text_general" indexed="true" stored="true" />
<field name="_indexname" type="string" indexed="true" stored="true" />
<field name="_version" type="string" indexed="true" stored="true" />
<field name="_version_" type="long" indexed="true" stored="true" />
</fields>

Then I added two fields that will be used by the suggester. One to store the suggestion text and another to store weight of that suggestion. The suggestion field should be a text type and weight field should be a float type. Both need to be stored in the index. In this case, these fields get their values form corresponding fields in our sitecore instance. These fields can be added to documents based on your specific indexing strategy.

<field name="term" type="text_general" indexed="true" stored="true" />
<field name="weight" type="float" indexed="true" stored="true" />

Define a custom field type for the suggest component:

Next we need to add a new type that the suggester will use to analyze and build the suggestion fields. This particular type will remove all non alphanumeric characters and be case-insensitive as well as tokenizing the contents of the field. This is not strictly necessary, existing types may be used. Again, this is done in schema.xml.

<types>
...
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
...
</types>

Define the suggest component for the index:

Now that we have the schema set up, we need to define a searchComponent that will do the suggesting. This is done in solrconfig.xml.

Add the following to the <config> node:

<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">fuzzy_suggestions</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
<lst name="suggester">
<str name="name">infixSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="indexPath">infix_suggestions</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>

lookup
The first uses the FuzzyLookupFactory: a FST-based sugester (Finite State Transducer) which will match terms starting with the provided characters while accounting for potential misspellings. This lookup implementation will not find terms where the provided characters are in the middle.

The second uses the AnalyzingInfixLookupFactory: which will look inside the terms for matches. Also the results will have <b> highlights around the provided terms inside the suggestions.

storeDir and indexPath

These parameters define the directory where suggester structure will be stored after it’s built. This parameter should be set so the data is available on disc without rebuilding.

field

The field to get the suggestions from, this could be a computed or a copy field.

suggestAnalyzerFieldType

This parameter is set to the fieldType that will process information in the defined ‘field’. I suggest starting simple and adding complexity as the need arises.

This fieldType is completely independent from the analysis chain applied to the field you specify for your suggester. It’s perfectly reasonable to have the two fieldTypes be much different.

The “string” fieldType should probably not be used. If a “string” type is appropriate for the use case, the TermsComponent will probably serve as well and it is much simpler.

buildOnStartup and buildOnCommit

Building the suggester data involves re-reading, decompressing and and adding the field from every document to the suggester. These two settings should both generally be set to “false”. On Startup happens every time Solr is started. On Commit happens every time a document is committed. In the case of a smaller list of potential suggestions, the latter is acceptable.

Define a requestHandler for the Suggest Component

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">infixSuggester</str>
<str name="suggest.dictionary">fuzzySuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">10</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

The “name” of the requestHandler defines the url that will be used to request suggestions. In this case, it will be http://”localhost”:8983/solr/index_name/suggest. Your port number may be different.

The requestHandler definition contains two parts:

Defaults
These are settings that you would like to apply to each request. They may be provided in the querystring if different values are necessary.

Multiple “suggest.dictionary” values may be used. Each one will have it’s own section of results. The values are the names of the suggesters that were defined in the Suggest Component.

Components
The name of the Suggest Component is set here. This connects the handler to the component.

Actually getting suggestions
Once all of this is set up, using it is very simple. Assuming a solr index url like this:
http://localhost:8983/solr/index_name

Build the suggester:

Issue http://localhost:8983/solr/index_name/suggest?suggest.build=true.

Until you do this step, no suggestions are returned.
The two build settings (buildOnStartup and buildOnCommit) can be used to avoid this, but consider the size of your index and the time and cpu that will be required to build the suggest index automatically.

Ask for suggestions:
Issue http://localhost:8983/solr/index_name/suggest?suggest.q=whatever
Additional parameters can be included, such as the count, the desired format (json or xml) or a specific suggest.dictionary.
Use “wt” and “indent” parameters to format your results into json or xml and apply indenting. e.g.: &wt=json&indent=true
The response will contain a “suggest” field. This field will contain fields for each of the suggest.dictionaries that was used. Each of these dictionary fields will have a “numFound” field as well as a “suggestions” field containing an array of the found suggestions and their weights.
Response Format:

{
suggest: {
suggester_name: {
suggest_query: { numFound: .., suggestions: [ {term: .., weight: .., payload: ..}, .. ]}
}
}

Hope this blog has helped you.

2 thoughts on “How to Setup Search Suggestions In Solr”

Leave a Reply

Your email address will not be published. Required fields are marked *

Tools & Practices

Tools and Technologies we use at Applied

Contact us now

Popular Posts