Elasticsearch in php doesn't recognize dash -


i'm working on project , try make search elasticsearch field can contain dash , when search can't find result i'm looking for, tried change mapping index doesn't work @ all. don't have error message can't find indexed using different field. did :

 $params = [ 'index' => 'arc', 'type' => 'purchase', 'id' => $purchase['id'], 'body' => $purchase ]; 

it worked great except field dash. $purchase looks :

 array:34 [     "id" => 163160     "distant" => "mor-938bbm28147090" [...] ] 

so when search "mor" find result when "mor-" nothing. tried change mapping doing :

 $params = [         'index' => 'arc',         'type' => 'purchase',         'id' => $purchase['id'],         'body' => [             'mappings' => [                 '_default_' => [                     'properties' => [                         'distant' => [                                 'type' => 'string',                                 'index' => 'not_analyzed'                         ]                     ]                 ]             ],             $purchase         ]     ]; 

but if try search "163160" can't find result.

whitespace analyzer right solution in case. takes account whitespaces while breaking text tokens, , characters "-" or "_" still treated part of term.

but if need partial matching, example "mor-" token, requires bit more complicated mapping.

as don't know php, i'll using elasticsearch syntax. first, create proper mapping:

put http://127.0.0.1:9200/arc {     "settings": {     "analysis": {       "analyzer": {         "edge_ngram_analyzer": {           "tokenizer": "my_tokenizer"         }       },       "tokenizer": {         "my_tokenizer": {           "type": "edge_ngram",           "min_gram": 3,           "max_gram": 18,           "token_chars": [             "letter",             "digit",             "punctuation"           ]         }       }     }   },     "mappings": {         "purchase": {             "properties": {                 "distant": {                     "type": "string",                     "analyzer": "edge_ngram_analyzer"                 }             }         }     } } 

as can see, use edgengram tokenizer here. when index document mor-938bbm28147090 in distant field, create following tokens:

[mor, mor-, mor-9, mor-93, mor-938, mor-938b, mor-938bb, ...] 

the core point here punctuation character class in token_chars list, tells elasticsearch, dash character (and others ! or ") should included in token , not treated "split char".

now when index document:

put http://127.0.0.1:9200/arc/purchase/163160 {     "distant": "mor-938bbm28147090" } 

and run term search query:

post http://127.0.0.1:9200/arc/purchase/_search {     "query": {       "bool" : {         "must" : {           "term" : {              "distant": "mor-93"            }         }       }     } } 

i in response:

"hits": {     "total": 1,     "max_score": 0.6337049,     "hits": [         {             "_index": "arc",             "_type": "purchase",             "_id": "163160",             "_score": 0.6337049,             "_source": {                 "distant": "mor-938bbm28147090"             }         }     ] } 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -