本文介绍了如何在GROUP BY Druid返回的结果中执行SELECT?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



I am having a hard time converting this simple SQL Query below into Druid:

SELECT country, city, Count(*) 
FROM people_data 
WHERE name="Mary" 
GROUP BY country, city;


So I came up with this query so far:

  "queryType": "groupBy",
  "dataSource" : "people_data",
  "granularity": "all",
  "metric" : "num_of_pages",
  "dimensions": ["country", "city"],
  "filter" : {
      "type" : "and",
      "fields" : [
            "type": "in",
            "dimension": "name",
            "values": ["Mary"]
            "type" : "javascript",
            "dimension" : "email",
            "function" : "function(value) { return (value.length !== 0) }"
  "aggregations": [

    { "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
  "intervals": [ "2016-07-20/2016-07-21" ]


The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?



Simple answer is that you cannot select arbitrary dimensions in your groupBy queries.

严格来说,即使SQL查询也没有意义.如果对于给定的country, city组合,有许多不同的namestreet值,那么如何将其压缩到单行中?您必须汇总它们,例如通过使用max函数.

Strictly speaking even SQL query does not make sense. If for a given combination of country, city there are many different values of name and street, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max function.


In this case you can include the same column in your data as both dimension and metric, e.g. name_dim and name_metric, and include corresponding aggregation over your metric, max(name_metric).


Please note, that if these columns, name etc, have high granularity values, then that will kill Druid's roll-up feature.

这篇关于如何在GROUP BY Druid返回的结果中执行SELECT?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 00:02