本文介绍了Elasticsearch-如何获取常用单词文档列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个临时索引,其中包含我需要审核的文档.我想按包含这些单词的方式对这些文档进行分组.
I have a temporary index with documents that I need to moderate. I want to group these documents by the words they contain.
例如,我有以下文件:
1-"aaa bbb ccc ddd eee fff"
1 - "aaa bbb ccc ddd eee fff"
2-"bbb mmm aaa fff xxx"
2 - "bbb mmm aaa fff xxx"
3-"hhh aaa fff"
3 - "hhh aaa fff"
所以,我想得到最受欢迎的单词,最好是计数:"aaa"-3,"fff"-3,"bbb"-2,等等.
So, I want to get the most popular words, ideally with counts: "aaa" - 3, "fff" - 3, "bbb" - 2, etc.
elasticsearch可以做到吗?
Is this possible with elasticsearch?
推荐答案
进行简单的术语汇总搜索将满足您的需求:
Doing a simple term aggregation search will meet your needs:
(其中 mydata
是您的字段的名称)
(where mydata
is the name of your field)
curl -XGET 'http://localhost:9200/test/data/_search?search_type=count&pretty' -d '{
"query": {
"match_all" : {}
},
"aggs" : {
"mydata_agg" : {
"terms": {"field" : "mydata"}
}
}
}'
将返回:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"mydata_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "aaa",
"doc_count" : 3
}, {
"key" : "fff",
"doc_count" : 3
}, {
"key" : "bbb",
"doc_count" : 2
}, {
"key" : "ccc",
"doc_count" : 1
}, {
"key" : "ddd",
"doc_count" : 1
}, {
"key" : "eee",
"doc_count" : 1
}, {
"key" : "hhh",
"doc_count" : 1
}, {
"key" : "mmm",
"doc_count" : 1
}, {
"key" : "xxx",
"doc_count" : 1
} ]
}
}
}
这篇关于Elasticsearch-如何获取常用单词文档列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!