Category Archives: ElasticSearch

ElasticSearch AutoComplete

Elasticsearch provides a great feature called Search-as-you-type providing us a way to implement search engine that looks like Google.In my recent project we had same requirement ,we were having n number of name/state/city/company/person in our Database and were required to support ad-hoc queries we choose ES as our tool to realize this use case.

Elasticsearch provides many features out of the box one of the feature ,i like most is autocomplete ,i remember days when we use to implement this feature using AJAX and running queries like %ELASTIC% to get the suggestions from DB ,but ES has a different approach to this problem using appropriate analyzers while indexing data into ES we can build the data in such a way that we don’t have to perform phrase query ,but by using exact matching queries we can get this functionality implemented.

For example Say we have Movie Data base ,with below entries
Reservoir Dogs
Airplane
Doctor Zhivago
The Deer Hunter
The Lord of the Rings

Using standard analyzer we will have below inverted index

InvertedIndex

Now say if we have all things implemented using standard analyzer,when we will type Th we will be suggested with nothing provided we are not querying to get all words that start with T* as it will be inefficient because there no such token Th in our inverted index,when we type The we will get two suggestion using match query The Deer Hunter and The Lord of the Rings but we wanted these suggestion to popup as soon as we type Th

To support this Elasticsearch provides us with n-grams analyzer ,for search-as-you-type, we use a specialized form of n-grams called edge n-grams. Edge n-grams are anchored to the beginning of the word. Edge n-gramming the word quick would result in this:(taken from Es Guide)
q
qu
qui
quic
quick

Using this analyzer when we index movie document with movie The Deer Hunter we will get following n-grams
T
Th
The
D
De
Dee
Deer
H
Hu
Hun
Hunt
Hunte
Hunter

These n-grams are actually the tokens present in our inverted index and when we type T we will make a exact matching query for term T and will returned by two document id’s and their contents as requested

For full implementation of above behavior and mapping to be used ,Find below link

ESAutoComplete

ElasticSearch save field explained !

If you are storing a document in elasticsearch having a field (name:word press) and you don’t want to store this field,but you can still retrieve the field if you have not disabled the _source (Enabled by default)

Elasticsearch by default saves every document that you send to it and therefore is able to give it back when requested . On the other hand Lucene has some kind of storage where you can store the fields that you want to get retrieved when a document ID is provided.

For example if the follwoing document is indexed in ES with I1 as index name with _source enabled(store is disabled by default)
{
_id:1,
name:”amit hora”,
}

When you query in a way to get all the documents having name=”a*” you will get the above document the reason being ES by default having _source field enabled and returns it with query in this case it will parse the document and will return the name field in doc having value as “a*”

while if you had _source field disabled you have to store the name field explicitly to be searched and retrieved when requested

Keep in mind though that retrieving many stored fields from lucene might require one disk seek per field while with retrieving only the _source from lucene and parsing it in order to retrieve the needed fields is just a single disk seek