Page Search¶
For page searches on consumerfinance.gov, we use Elasticsearch and the django-opensearch-dsl library, which is a lightweight wrapper around opensearch-dsl-py.
- Indexing
 - Elasticsearch index configuration
 - Django model information
 - Custom fields
 - Helpers
 - Building the index
 - Searching
 - Autocomplete
 - Suggestions
 - References
 
Indexing¶
For any of our Django apps that need to implement search for their Django models or Wagtail page types, we include a documents.py file that defines the Elasticsearch documents that will map the model or page to the Elasticsearch index.
The Document class includes three things:
- The Elasticsearch index configuration
 - The Django model information
 - Custom fields to index, and any preparation that they require
 
We'll use our Ask CFPB answer search document as an example for each of these:
from django_opensearch_dsl import Document
from django_opensearch_dsl.registries import registry
@registry.register_document
class AnswerPageDocument(Document):
    pass
Elasticsearch index configuration¶
The index configuration is provided by an Index subclass on the document class that defines the django-opensearch-dsl index options.
from search.elasticsearch_helpers import environment_specific_index
class AnswerPageDocument(Document):
    class Index:
        name = environment_specific_index('ask-cfpb')
        settings = {'number_of_shards': 1,
                    'number_of_replicas': 0}
For index naming, we have a helper function, environment_specific_index, that will generate the index name specific to the deployment environment. This allows each index to be isolated to a deployment environment within an Elasticsearch cluster.
Django model information¶
The Django model information is provided by a Django class on the document class that defines the model and any fields names to be indexed directly from the model. A get_queryset method can be overriden to perform any filtering on the model's queryset before content is indexed.
from ask_cfpb.models.answer_page import AnswerPage
class AnswerPageDocument(Document):
    def get_queryset(self):
        query_set = super().get_queryset()
        return query_set.filter(live=True, redirect_to_page=None)
    class Django:
        model = AnswerPage
        fields = [
            'search_tags',
            'language',
        ]
The fields in fields will be indexed without any preparation/manipulation, directly as are stored on the model.
Custom fields¶
Sometimes it might be desirable to index a field as an alternative type — say, the string that matches an integer for a Django field that species choices.
It might also be desirable to construct a field to index from multiple fields on the model, particularly for Wagtail pages with stream fields.
We may also want to specify Elasticsearch-specific field properties, like a custom analyzer.
To do so, we specify custom fields as attributes on the document class, with an attr argument that specifies the field on the model to reference.
from django_opensearch_dsl import fields
from search.elasticsearch_helpers import synonym_analyzer
class AnswerPageDocument(Document):
    text = fields.TextField(attr="text", analyzer=synonym_analyzer)
The attr on the model can be a @property or a Django model field.
We can also do any data preparation/manipulation for fields using prepare_-prefixed methods.
from django_opensearch_dsl import fields
class AnswerPageDocument(Document):
    portal_topics = fields.KeywordField()
    def prepare_portal_topics(self, instance):
        return [topic.heading for topic in instance.portal_topic.all()]
Helpers¶
We provide a few common helpers in search.elasticsearch_helpers for use in creating document classes:
environment_specific_index(base_name): Generate the index name for thebase_namethat is specific to the deployment environment. This allows each index to be isolated to a deployment environment within an Elasticsearch cluster.ngram_tokenizer: A reusable ngram analyzer for creating fields that autocomplete terms. This is used for type-ahead search boxes.synonym_analyzer: A reusable analyzer for creating fields that will match synonyms of a search term.
from search.elasticsearch_helpers import (
    environment_specific_index,
    ngram_tokenizer,
    synonym_analyzer,
)
Building the index¶
With the Document class created for your model in a documents.py module within a Django app listed in INSTALLED_APPS, all that is left to do is to use the django-opensearch-dsl management commands to rebuild the index:
./cfgov/manage.py opensearch index --force rebuild [INDEX]
./cfgov/manage.py opensearch document --force --indices [INDEX] --refresh --parallel index
The index for that app's models can also be rebuilt at any time:
./cfgov/manage.py opensearch index --force rebuild [INDEX]
./cfgov/manage.py opensearch document --force --indices [INDEX] --refresh --parallel index
Finally, the indexes for all apps can be rebuilt using:
./cfgov/manage.py opensearch index --force rebuild
./cfgov/manage.py opensearch document --force --refresh --parallel index
Searching¶
The document class provides a search() class method that returns a Search object. The Search object is opensearch-dsl-py's representation of Elasticsearch search requests.
To query for a specific term, for example:
from ask_cfpb.documents import AnswerPageDocument
AnswerPageDocument.search().query(
    "match", text={"query": search_term, "operator": "AND"}
)
We can also add a filter context before querying, which we do to limit results to a specific language in Ask CFPB:
from ask_cfpb.documents import AnswerPageDocument
search = AnswerPageDocument.search().filter("term", language=language)
search.query(
    "match", text={"query": search_term, "operator": "AND"}
)
Autocomplete¶
For search box autocomplete, we use a field with our ngram_tokenizer analyzer and then issue a "match" search query for that field.
Using the Ask CFPB document search above, with its language filter context, this looks like:
from search.elasticsearch_helpers import ngram_tokenizer
class AnswerPageDocument(Document):
    autocomplete = fields.TextField(analyzer=ngram_tokenizer)
search = AnswerPageDocument.search().filter("term", language=language)
search.query('match', autocomplete=search_term)
Suggestions¶
For suggested spelling corrections for search terms, the Search object has a suggest() method that provides spelling suggestions for a given term on a given field.
Using the Ask CFPB document search above, with its language filter context, this looks like:
from ask_cfpb.documents import AnswerPageDocument
search = AnswerPageDocument.search().filter("term", language=language)
search.suggest('suggestion', search_term, term={'field': 'text'})