在Google Analytics BigQuery中,以下查询尝试汇总sessionshits级别的信息

SELECT
  visitId,
  trafficSource.source as source,
  trafficSource.medium as medium,
  device.browser as browser,
  min(hits.hour) as firstHitHour,
  boolean(count(hits.eventInfo.eventCategory = "SomeValue")) as hasSomeValue
FROM
  [my-table.ga_sessions_20150216]
GROUP BY
  visitId, source, medium, browser;


如果在一行会话的所有匹配中,至少有一个匹配的boolean(count(hits.eventInfo.eventCategory = "SomeValue")) as hasSomeValue等于true,则hits.eventInfo.eventCategory行应为SomeValue

预期结果如下:

sessionId   source         medium      browser      firstHitHour     hasSomeValue
--------------------------------------------------------------------------------------
12318       google         cpc         firefox      12               true
13317       google         organic     safari       14               null
13551       bing           organic     firefox      14               true
13610       orange         display     chrome       14               true
14381       stackoverflow  referral    safari       15               false
14422       google         organic     chrome       15               true


但是,上述行似乎无效。即使我输入了一些哑巴的值,例如hits.eventInfo.eventCategory = "Blablablabla",它仍然会为某些行输出true(显然,没有命中值具有此哑巴值)。

实际上,指令count(hits.eventInfo.eventCategory = "Blablablabla")-没有boolean()会返回看似随机的结果(与实际计数没有共同点)。

该行应该是什么以触发正确的输出?

最佳答案

使用standard SQL会更轻松。要检查hits中是否存在该值,请在其上使用EXISTS子句。例如,

SELECT
  visitId,
  trafficSource.source as source,
  trafficSource.medium as medium,
  device.browser as browser,
  (SELECT MIN(hour) FROM UNNEST(hits)) as firstHitHour,
  EXISTS (SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventCategory = "SomeValue") as hasSomeValue
FROM
  `my-table.ga_sessions_20150216`
GROUP BY
  visitId, source, medium, browser;


另请参见migrating from legacy to standard SQL上的指南。

如果确实要使用旧版SQL,则需要结合使用WITHIN RECORDhits计数。

关于google-analytics - 汇总来自BigQuery中Analytics(分析)查询的点击和 session 数据,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40782028/

10-11 08:48