elasticsearch - Elastic search aggregation on map - on each key -
i have following kind of documents.
document 1
{ "doc": { "id": 1, "errors": { "e1":5, "e2":20, "e3":30 }, "warnings": { "w1":1, "w2":2 } } }
document 2
{ "doc": { "id": 2, "errors": { "e1":10 }, "warnings": { "w1":1, "w2":2, "w3":33, } } }
i following sum stats in 1 or more calls. possible? tried various solution works when key known. in case map keys (e1, e2 etc) not known.
{ "errors": { "e1": 15, "e2": 20, "e3": 30 }, "warnings": { "w1": 2, "w2": 4, "w3": 33 } }
there 2 solutions, none of them pretty. have point out option 2 should preferred way go since option 1 uses experimental feature.
1. dynamic mapping, [experimental] scripted aggregation
inspired this answer , scripted metric aggregation page of es docs, began inserting documents non-existing index (which default creates dynamic mapping).
nb: tested on es 5.4, documentation suggests feature available @ least 2.0.
the resulting query aggregation following:
post /my_index/my_type/_search { "size": 0, "query" : { "match_all" : {} }, "aggs": { "errors": { "scripted_metric": { "init_script" : "params._agg.errors = [:]", "map_script" : "for (t in params['_source']['doc']['errors'].entryset()) { params._agg.errors[t.key] = t.value } ", "combine_script" : "return params._agg.errors", "reduce_script": "map res = [:] ; (a in params._aggs) { (t in a.entryset()) { res[t.key] = res.containskey(t.key) ? res[t.key] + t.value : t.value } } return res" } }, "warnings": { "scripted_metric": { "init_script" : "params._agg.errors = [:]", "map_script" : "for (t in params['_source']['doc']['warnings'].entryset()) { params._agg.errors[t.key] = t.value } ", "combine_script" : "return params._agg.errors", "reduce_script": "map res = [:] ; (a in params._aggs) { (t in a.entryset()) { res[t.key] = res.containskey(t.key) ? res[t.key] + t.value : t.value } } return res" } } } }
which produces output:
{ ... "aggregations": { "warnings": { "value": { "w1": 2, "w2": 4, "w3": 33 } }, "errors": { "value": { "e1": 15, "e2": 20, "e3": 30 } } } }
if following path might interested in javadoc of params['_source']
underneath.
warning: believe scripted aggregation not efficient , better performance should check out option 2 or different data processing engine.
what experimental mean:
this functionality experimental , may changed or removed in future release. elastic take best effort approach fix issues, experimental features not subject support sla of official ga features.
with in mind proceed option 2.
2. static nested mapping, nested aggregation
here idea store data differently , able query , aggregate differently. firstly, need create mapping using nested data type.
put /my_index_nested/ { "mappings": { "my_type": { "properties": { "errors": { "type": "nested", "properties": { "name": {"type": "keyword"}, "val": {"type": "integer"} } }, "warnings": { "type": "nested", "properties": { "name": {"type": "keyword"}, "val": {"type": "integer"} } } } } } }
a document in such index this:
{ "_index": "my_index_nested", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "errors": [ { "name": "e1", "val": 5 }, { "name": "e2", "val": 20 }, { "name": "e3", "val": 30 } ], "warnings": [ { "name": "w1", "val": 1 }, { "name": "w2", "val": 2 } ] } }
next need write aggregate query. first need use nested aggregation
, allow query special nested
data type. since want aggregate name
, , sum values of val
, need sub-aggregation.
the resulting query follows (i adding comments alongside query clarity):
post /my_index_nested/my_type/_search { "size": 0, "aggs": { "errors_top": { "nested": { // declare nested objects want work "path": "errors" }, "aggs": { "errors": { // aggregating - different values of name "terms": {"field": "errors.name"}, // sub aggregation "aggs": { "error_sum": { // sum val same name "sum": {"field": "errors.val"} } } } } }, "warnings_top": { // analogous errors } } }
the output of query like:
{ ... "aggregations": { "errors_top": { "doc_count": 4, "errors": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "e1", "doc_count": 2, "error_sum": { "value": 15 } }, { "key": "e2", "doc_count": 1, "error_sum": { "value": 20 } }, { "key": "e3", "doc_count": 1, "error_sum": { "value": 30 } } ] } }, "warnings_top": { ... } } }
hope helps.
Comments
Post a Comment