Elasticsearch-如何获取常用单词文档列表

本文介绍了Elasticsearch-如何获取常用单词文档列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个临时索引，其中包含我需要审核的文档.我想按包含这些单词的方式对这些文档进行分组.

I have a temporary index with documents that I need to moderate. I want to group these documents by the words they contain.

例如，我有以下文件:

1-"aaa bbb ccc ddd eee fff"

1 - "aaa bbb ccc ddd eee fff"

2-"bbb mmm aaa fff xxx"

2 - "bbb mmm aaa fff xxx"

3-"hhh aaa fff"

3 - "hhh aaa fff"

所以，我想得到最受欢迎的单词，最好是计数:"aaa"-3，"fff"-3，"bbb"-2，等等.

So, I want to get the most popular words, ideally with counts: "aaa" - 3, "fff" - 3, "bbb" - 2, etc.

elasticsearch可以做到吗?

Is this possible with elasticsearch?

推荐答案

进行简单的术语汇总搜索将满足您的需求:

Doing a simple term aggregation search will meet your needs:

(其中 mydata 是您的字段的名称)

(where mydata is the name of your field)

curl -XGET 'http://localhost:9200/test/data/_search?search_type=count&pretty' -d '{
  "query": {
    "match_all" : {}
  },
  "aggs" : {
      "mydata_agg" : {
    "terms": {"field" : "mydata"}
    }
  }
}'

将返回:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "mydata_agg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "aaa",
        "doc_count" : 3
      }, {
        "key" : "fff",
        "doc_count" : 3
      }, {
        "key" : "bbb",
        "doc_count" : 2
      }, {
        "key" : "ccc",
        "doc_count" : 1
      }, {
        "key" : "ddd",
        "doc_count" : 1
      }, {
        "key" : "eee",
        "doc_count" : 1
      }, {
        "key" : "hhh",
        "doc_count" : 1
      }, {
        "key" : "mmm",
        "doc_count" : 1
      }, {
        "key" : "xxx",
        "doc_count" : 1
      } ]
    }
  }
}

这篇关于Elasticsearch-如何获取常用单词文档列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！