本文介绍了LAG()函数错误在BigQuery SQL中;无法返回滞后的hits.page.pagePath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试在BigQuery表格中创建一个只包含hits.page.pagePath的滞后值的列。对于这个例子,假设我的匹配时间值为0,513,518,762,991。这些值对应于我创建的unique_visit_id。每天有数千次访问,每次访问都有不同的时间值和行号。

为了帮助进行各种计算,我希望在数据框中添加一列,该列仅包含hits.page.pagePath列中所有行的滞后值(当hits.time> 0时没有滞后关联)。请注意,hits.time是一个重复的组,它引用了使用 CONCAT(fullVisitorId,STRING(visitId))AS unique_visit_id 计算的唯一访问ID。因此,我怀疑像OVER这样的命令会涉及。我的第一个查询使用了以下方法:

$ p $ SELECT date,LAG(hits.page.pagePath,1),CONCAT(fullVisitorId, STRING(visitId))AS unique_visit_id,visitId,visitNumber,fullVisitorId,totals.pageviews,totals.bounces,
hits.page.pagePath,device.deviceCategory,device.browser,device.browserVersion,hits.customVariables.index,
hits.customVariables.customVarName,hits.customVariables.customVarValue,hits.time
FROM(FLATTEN([XXXXXXXX.ga_sessions_20140711],hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC,hits.time ASC
LIMIT 1000;

因此不幸地返回错误:

错误:LAG是一个分析函数,必须附带OVER子句。 / strong>



我猜测OVER会引用 CONCAT(fullVisitorId,STRING(visitId))AS unique_visit_id 因为包含相同unique_visit_id的每行代表一次访问中的一次命中。不幸的是,在这里使用OVER也会返回一个错误。

解决方案

分析函数有一个奇怪的语法......你几乎在那里,但您只需将 ORDER BY 子句移动到使用的 OVER 语句中, LAG()

 选择日期,LAG(hits.page.pagePath,1)
OVER(PARTITION BY unique_visit_id ORDER BY hits.time ASC)滞后,
CONCAT(fullVisitorId,STRING(visitId))AS unique_visit_id,visitId,
...
FROM(FLATTEN( ),
WHERE hits.customVariables.index = 4
LIMIT 1000;


I am currently trying to create a column within a BigQuery table that simply includes the lagged values of the hits.page.pagePath. For this example, imagine that I have hits.time values of 0, 513, 518, 762, 991. These values correspond to a unique_visit_id which I created. There are thousands of unique visits per day, each with different hits.time values and row numbers.

To help with a variety of calculations, I want to add a column to the data frame that simply includes the lagged value for all rows in the hits.page.pagePath column (with there being no lag associated when hits.time > 0). Note that the hits.time is a repeated group that refers to a unique visit id which is calculated using CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id. Thus, I suspect that a command like OVER would be involved. My initial query used the following approach:

SELECT date, LAG(hits.page.pagePath,1), CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, visitNumber, fullVisitorId, totals.pageviews, totals.bounces, 
hits.page.pagePath, device.deviceCategory, device.browser, device.browserVersion, hits.customVariables.index,
hits.customVariables.customVarName, hits.customVariables.customVarValue, hits.time
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC, hits.time ASC
LIMIT 1000;

Thus unfortunately returned the error:

Error: LAG is an analytic function and must be accompanied by an OVER clause.

Job ID: XXXXXXXX-XXX:job_zduhuRKgvrgmA7niBzcyb3empwY

I have guessed that OVER would reference the CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id since each row containing the same unique_visit_id represents a hit inside a unique visit. Unfortunately, using an OVER here has also returned an error.

解决方案

Analytic functions have an odd syntax... you're almost there, but you just need to move the ORDER BY clause into the OVER statement used by LAG():

SELECT date, LAG(hits.page.pagePath,1) 
  OVER(PARTITION BY unique_visit_id ORDER BY hits.time ASC) as lagged, 
  CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, 
  ...
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
LIMIT 1000;

这篇关于LAG()函数错误在BigQuery SQL中;无法返回滞后的hits.page.pagePath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 03:25