问题描述
以下两个版本的完全相同的查询
版本 1(在内部 SELECT 中使用 k 作为别名):
SELECT k, w_vol,ROW_NUMBER() OVER (ORDER BY k DESC) AS rank1,ROW_NUMBER() OVER (ORDER BY w_vol DESC) AS rank2从 (SELECT w_vol, c_date AS k FROM(SELECT 1590 AS c_date, 1 AS w_vol),(SELECT 1599 AS c_date, 1 AS w_vol),(SELECT 1602 AS c_date, 1 AS w_vol),(SELECT 1609 AS c_date, 2 AS w_vol),(SELECT 1610 AS c_date, 1 AS w_vol),)按 1 订购
版本 2(在内部 SELECT 中使用 l 作为别名):
SELECT l, w_vol,ROW_NUMBER() OVER (ORDER BY l DESC) AS rank1,ROW_NUMBER() OVER (ORDER BY w_vol DESC) AS rank2从 (SELECT w_vol, c_date AS l FROM(SELECT 1590 AS c_date, 1 AS w_vol),(SELECT 1599 AS c_date, 1 AS w_vol),(SELECT 1602 AS c_date, 1 AS w_vol),(SELECT 1609 AS c_date, 2 AS w_vol),(SELECT 1610 AS c_date, 1 AS w_vol),)按 1 订购
下面是我一直为两个查询得到的输出(注意没有缓存结果
被使用)
我对理解为什么会产生这个或那个结果没有问题(它相对微不足道) - 但是 - 无论使用什么别名,我都希望结果是一样的 - 它根本就是别名!
问题:为什么我总是在版本 1 中得到一个结果而在版本 2 中得到另一个结果?
注意:我对如何以不同方式重写查询以隐藏"问题的建议不感兴趣!请不要打扰这个方向的答案!
忘了说:
大多数别名产生与 'l' 相同的结果,但很少有与 'k' 相同的结果.另一个例子是x".只是想说明一下 - 有问题的不仅仅是一个特定的别名.甚至更多 - 有问题的别名取决于文件名.
所以我认为这是一些涉及字段/别名的哈希问题 - 但这只是我的疯狂猜测!
我没有对该行为的具体解释,但请注意两个结果仍然正确:两者之间的唯一区别结果是,当对被排序的列 (w_vol
) 具有相同值 (1
) 的几行进行排序时,选择了不同的顺序.
一般来说,当将 ORDER BY
应用于相等的值时,BigQuery 不承诺任何特定的排序,并且该排序可能因任何原因在同一查询的不同执行中有所不同.
我同意有趣的是,排序似乎取决于别名,但我可以看到很多合理的解释为什么会发生这种情况——例如,引擎可能会遍历使用分析函数的字段列表,并更改别名可能会改变对这些分析函数求值的顺序,这反过来又会在按一列排序时改变具有相等值的行的顺序.
Below two versions of the same exactly query
SELECT k, w_vol,
ROW_NUMBER() OVER (ORDER BY k DESC) AS rank1,
ROW_NUMBER() OVER (ORDER BY w_vol DESC) AS rank2
FROM (
SELECT w_vol, c_date AS k FROM
(SELECT 1590 AS c_date, 1 AS w_vol),
(SELECT 1599 AS c_date, 1 AS w_vol),
(SELECT 1602 AS c_date, 1 AS w_vol),
(SELECT 1609 AS c_date, 2 AS w_vol),
(SELECT 1610 AS c_date, 1 AS w_vol),
)
ORDER BY 1
SELECT l, w_vol,
ROW_NUMBER() OVER (ORDER BY l DESC) AS rank1,
ROW_NUMBER() OVER (ORDER BY w_vol DESC) AS rank2
FROM (
SELECT w_vol, c_date AS l FROM
(SELECT 1590 AS c_date, 1 AS w_vol),
(SELECT 1599 AS c_date, 1 AS w_vol),
(SELECT 1602 AS c_date, 1 AS w_vol),
(SELECT 1609 AS c_date, 2 AS w_vol),
(SELECT 1610 AS c_date, 1 AS w_vol),
)
ORDER BY 1
Below is output i am consistently getting for both queries (note No Cached Results
is used)
I don't have problem with understanding why this or that result is generated (it is relatively trivial) - BUT - I expected result be the same no matter what alias is used - it is just alias at all!
Note: I am not interested in recommendations on how to re-write query differently to "hide" the problem! Please do not bother with answers in this direction!
Forgot to mention:
Most of aliases produce same result as with 'l', but very few as with'k'. Another example of such is 'x'. Just wanted to make it clear - it is not just one particular alias that is problematic. And even more - problematic alias depends on name of fileds.
So i think it is some hashing issue that involves fields/aliases names - but this is just my wild guess!
I don't have a specific explanation for the behavior, but note that both results are still correct: the only difference between the results is that a different order was chosen when sorting several rows that have the same value (1
) for the column being sorted (w_vol
).
Generally speaking, BigQuery does not promise any particular ordering when applying ORDER BY
to equal values, and that ordering may vary across different executions of the same query for any reason.
I agree it's interesting that the ordering seems to depend on the alias, but I could see plenty of plausible explanations for why this happens--for example, the engine might iterate through a list of fields that use analytic functions, and changing the alias might change the order in which those analytic functions are evaluated, which in turn changes the ordering of rows with equal values when sorting by one column.
这篇关于不同的别名为完全相同的查询产生不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!