问题描述
我很难将下面的这个简单的SQL查询转换为Druid:
I am having a hard time converting this simple SQL Query below into Druid:
SELECT country, city, Count(*)
FROM people_data
WHERE name="Mary"
GROUP BY country, city;
所以到目前为止,我想到了这个查询:
So I came up with this query so far:
{
"queryType": "groupBy",
"dataSource" : "people_data",
"granularity": "all",
"metric" : "num_of_pages",
"dimensions": ["country", "city"],
"filter" : {
"type" : "and",
"fields" : [
{
"type": "in",
"dimension": "name",
"values": ["Mary"]
},
{
"type" : "javascript",
"dimension" : "email",
"function" : "function(value) { return (value.length !== 0) }"
}
]
},
"aggregations": [
{ "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
],
"intervals": [ "2016-07-20/2016-07-21" ]
}
上面的查询运行了,但是Druid数据源中的groupBy似乎都没有被评估,因为我在输出中看到的人的名字不是Mary.有人对如何使这项工作有任何意见吗?
The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?
推荐答案
简单的答案是,您不能在groupBy
查询中选择任意尺寸.
Simple answer is that you cannot select arbitrary dimensions in your groupBy
queries.
严格来说,即使SQL查询也没有意义.如果对于给定的country, city
组合,有许多不同的name
和street
值,那么如何将其压缩到单行中?您必须汇总它们,例如通过使用max
函数.
Strictly speaking even SQL query does not make sense. If for a given combination of country, city
there are many different values of name
and street
, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max
function.
在这种情况下,您可以在数据中包含与维度和指标相同的列,例如name_dim
和name_metric
,并包括针对您的指标max(name_metric)
的相应汇总.
In this case you can include the same column in your data as both dimension and metric, e.g. name_dim
and name_metric
, and include corresponding aggregation over your metric, max(name_metric)
.
请注意,如果这些列,name
等具有较高的粒度值,则将终止Druid的汇总功能.
Please note, that if these columns, name
etc, have high granularity values, then that will kill Druid's roll-up feature.
这篇关于如何在GROUP BY Druid返回的结果中执行SELECT?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!