问题描述
我目前正在尝试在BigQuery表格中创建一个只包含hits.page.pagePath的滞后值的列。对于这个例子,假设我的匹配时间值为0,513,518,762,991。这些值对应于我创建的unique_visit_id。每天有数千次访问,每次访问都有不同的时间值和行号。
为了帮助进行各种计算,我希望在数据框中添加一列,该列仅包含hits.page.pagePath列中所有行的滞后值(当hits.time> 0时没有滞后关联)。请注意,hits.time是一个重复的组,它引用了使用 CONCAT(fullVisitorId,STRING(visitId))AS unique_visit_id
计算的唯一访问ID。因此,我怀疑像OVER这样的命令会涉及。我的第一个查询使用了以下方法:
$ p $ SELECT date,LAG(hits.page.pagePath,1),CONCAT(fullVisitorId, STRING(visitId))AS unique_visit_id,visitId,visitNumber,fullVisitorId,totals.pageviews,totals.bounces,
hits.page.pagePath,device.deviceCategory,device.browser,device.browserVersion,hits.customVariables.index,
hits.customVariables.customVarName,hits.customVariables.customVarValue,hits.time
FROM(FLATTEN([XXXXXXXX.ga_sessions_20140711],hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC,hits.time ASC
LIMIT 1000;
因此不幸地返回错误:
错误:LAG是一个分析函数,必须附带OVER子句。 / strong>
我猜测OVER会引用 CONCAT(fullVisitorId,STRING(visitId))AS unique_visit_id
因为包含相同unique_visit_id的每行代表一次访问中的一次命中。不幸的是,在这里使用OVER也会返回一个错误。
分析函数有一个奇怪的语法......你几乎在那里,但您只需将 ORDER BY
子句移动到使用的
: OVER
语句中, LAG()
选择日期,LAG(hits.page.pagePath,1)
OVER(PARTITION BY unique_visit_id ORDER BY hits.time ASC)滞后,
CONCAT(fullVisitorId,STRING(visitId))AS unique_visit_id,visitId,
...
FROM(FLATTEN( ),
WHERE hits.customVariables.index = 4
LIMIT 1000;
I am currently trying to create a column within a BigQuery table that simply includes the lagged values of the hits.page.pagePath. For this example, imagine that I have hits.time values of 0, 513, 518, 762, 991. These values correspond to a unique_visit_id which I created. There are thousands of unique visits per day, each with different hits.time values and row numbers.
To help with a variety of calculations, I want to add a column to the data frame that simply includes the lagged value for all rows in the hits.page.pagePath column (with there being no lag associated when hits.time > 0). Note that the hits.time is a repeated group that refers to a unique visit id which is calculated using CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id
. Thus, I suspect that a command like OVER would be involved. My initial query used the following approach:
SELECT date, LAG(hits.page.pagePath,1), CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, visitNumber, fullVisitorId, totals.pageviews, totals.bounces,
hits.page.pagePath, device.deviceCategory, device.browser, device.browserVersion, hits.customVariables.index,
hits.customVariables.customVarName, hits.customVariables.customVarValue, hits.time
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC, hits.time ASC
LIMIT 1000;
Thus unfortunately returned the error:
Error: LAG is an analytic function and must be accompanied by an OVER clause.
Job ID: XXXXXXXX-XXX:job_zduhuRKgvrgmA7niBzcyb3empwY
I have guessed that OVER would reference the CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id
since each row containing the same unique_visit_id represents a hit inside a unique visit. Unfortunately, using an OVER here has also returned an error.
Analytic functions have an odd syntax... you're almost there, but you just need to move the ORDER BY
clause into the OVER
statement used by LAG()
:
SELECT date, LAG(hits.page.pagePath,1)
OVER(PARTITION BY unique_visit_id ORDER BY hits.time ASC) as lagged,
CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId,
...
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
LIMIT 1000;
这篇关于LAG()函数错误在BigQuery SQL中;无法返回滞后的hits.page.pagePath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!