Elasticsearch in php doesn't recognize dash -
i'm working on project , try make search elasticsearch field can contain dash , when search can't find result i'm looking for, tried change mapping index doesn't work @ all. don't have error message can't find indexed using different field. did :
$params = [ 'index' => 'arc', 'type' => 'purchase', 'id' => $purchase['id'], 'body' => $purchase ];
it worked great except field dash. $purchase
looks :
array:34 [ "id" => 163160 "distant" => "mor-938bbm28147090" [...] ]
so when search "mor" find result when "mor-" nothing. tried change mapping doing :
$params = [ 'index' => 'arc', 'type' => 'purchase', 'id' => $purchase['id'], 'body' => [ 'mappings' => [ '_default_' => [ 'properties' => [ 'distant' => [ 'type' => 'string', 'index' => 'not_analyzed' ] ] ] ], $purchase ] ];
but if try search "163160" can't find result.
whitespace analyzer right solution in case. takes account whitespaces while breaking text tokens, , characters "-" or "_" still treated part of term.
but if need partial matching, example "mor-"
token, requires bit more complicated mapping.
as don't know php, i'll using elasticsearch syntax. first, create proper mapping:
put http://127.0.0.1:9200/arc { "settings": { "analysis": { "analyzer": { "edge_ngram_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "edge_ngram", "min_gram": 3, "max_gram": 18, "token_chars": [ "letter", "digit", "punctuation" ] } } } }, "mappings": { "purchase": { "properties": { "distant": { "type": "string", "analyzer": "edge_ngram_analyzer" } } } } }
as can see, use edgengram tokenizer here. when index document mor-938bbm28147090
in distant
field, create following tokens:
[mor, mor-, mor-9, mor-93, mor-938, mor-938b, mor-938bb, ...]
the core point here punctuation
character class in token_chars
list, tells elasticsearch, dash character (and others ! or ") should included in token , not treated "split char".
now when index document:
put http://127.0.0.1:9200/arc/purchase/163160 { "distant": "mor-938bbm28147090" }
and run term search query:
post http://127.0.0.1:9200/arc/purchase/_search { "query": { "bool" : { "must" : { "term" : { "distant": "mor-93" } } } } }
i in response:
"hits": { "total": 1, "max_score": 0.6337049, "hits": [ { "_index": "arc", "_type": "purchase", "_id": "163160", "_score": 0.6337049, "_source": { "distant": "mor-938bbm28147090" } } ] }
Comments
Post a Comment