ElasticSearch-从索引中获取所有可用的过滤器（聚合）

本文介绍了ElasticSearch-从索引中获取所有可用的过滤器（聚合）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！假设我有："hits": [ { "_index": "products", "_type": "product", "_id": "599c2b3fc991ee0a597034fa", "_score": 1, "_source": {, "attributes": { "1": [ "a" ], "2": [ "b", "c" ], "3": [ "d", "e" ], "4": [ "f", "g" ], "5": [ "h", "i" ] } } }, { "_index": "products", "_type": "product", "_id": "599c4bb4b970c25976ced8bd", "_score": 1, "_source": { "attributes": { "1": [ "z" ], "2": [ "y" ] } }每个产品都有属性。每个属性都有ID和一个值。我可以按属性很好地过滤产品，但现在我要从MongoDB创建可能的属性列表。我想找到一种单独从ElasticSearch生成列表的方法（也许只是查询MongoDB以获取其他数据）。Each product has attributes. Each attribute has ID and a value. I can filter the products by attributes fine but for now I am creating the "possible attributes" list from MongoDB. I would like to find a way to generate such a list from ElasticSearch alone (and maybe just query MongoDB for additional data).我需要的是：{ 1: [a, z], 2: [b, c, y], etc.}这样的聚合看起来如何？获取所有可用属性（按 attribute.id 分组）及其所有可能值（遍及所有产品）？How would such a aggregation look like? Get all available attributes (grouped by attribute.id) with all of their possible values (throughout all products)?推荐答案您不能在一个查询中做到这一点，但是在两个查询中却很容易：You cannot do it in one query but it is fairly easy in two:您可以使用映射来获取文档中的所有字段：You can use mapping to get all the fields in your documents:curl -XGET "http://localhost:9200/your_index/your_type/_mapping" 检索其值然后可以使用多个术语聚合以获取字段的所有值：Retrieving their valuesYou can then use multiple Terms aggregation to get all the values of a field:curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'{ "size": 0, "aggs": { "field1Values": { "terms": { "field": "field1", "size": 20 } }, "field2Values": { "terms": { "field": "field2", "size": 20 } }, "field3Values": { "terms": { "field": "field3", "size": 20 } }, ... }}'这将检索每个字段的前20个最频繁出现的值。This retrieve the top 20 most frequents values for each field.此20个值的限制是防止巨大响应的限制（如果您有例如数十亿个具有唯一字段的文档）。您可以修改术语聚合的大小参数以增加它。根据您的要求，我猜选择一个比每个字段获取的不同值的数量的大致估计大10倍的东西应该可以解决问题。This limit of 20 values is a restriction to prevent a huge response (if you have a few billion documents with a unique fields for instance). You can modify the "size" parameters of the terms aggregation to increase it. From your requirements I guess choosing something 10x larger than a rough estimate of the number of different values taken by each field should do the trick.您还可以使用基数聚合以获得此实际值，然后将其用作术语聚合的大小。请注意，基数不是一个大数的估计，因此您可能要使用基数* 2 。You can also do an intermediate query using the cardinality aggregation to get this actual value and then use it as the size of your term aggregation. Please note than cardinality is an estimate when it comes to large number so you may want to use cardinality * 2.curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'{ "size": 0, "aggs": { "field1Cardinality": { "cardinality": { "field": "field1" } }, "field2Cardinality": { "cardinality": { "field": "field2" } }, "field3Cardinality": { "cardinality": { "field": "field3" } }, ... }}' 如何处理巨大的基数如果没有太多不同的属性，则先前的方法适用。如果存在，则应更改文档的存储方式，以防止映射爆炸，像这样存储它们：{ "attributes":[ { "name":"1", "value":[ "a" ] }, { "name":"2", "value":[ "b", "c" ] }, { "name":"3", "value":[ "d", "e" ] }, { "name":"4", "value":[ "f", "g" ] }, { "name":"5", "value":[ "h", "i" ] } ]}将解决此问题，您将可以对名称使用术语组合，然后对值使用子术语组合以获取结果您要Would fix the problem and you will be able to use a term aggregation on "name" and then a sub terms aggregation on "value" to get what you want:curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'{ "size": 0, "aggs": { "attributes": { "terms": { "field": "attributes.name", "size": 1000 }, "aggs": { "values": { "terms": { "field": "attributes.value", "size": 100 } } } } }}'使用嵌套映射以获取属性。这篇关于ElasticSearch-从索引中获取所有可用的过滤器（聚合）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！