问题描述
我的bq表架构:
Continuing this post: bigquery pivoting with nested fieldI'm trying to flatten this table. I would like to unnest the timeseries.data fields, i.e. the final number of rows should be equal to the total length of timeseries.data arrays. I would also like to add annotation.properties.key with certain value as additional columns, and annotation.properties.value as its value. So in this case, it would be the "margin" column. However the following query gives me error: "Unrecognized name: data". But after the last FROM, I did already: unnest(timeseries.data) as data.
flow_timestamp, channel_name, number_of_digits, timestamp, value, margin
2019-10-31 15:31:15.079674 UTC, channel_1, 4, 2018-02-28T02:00:00, 50, 0.01
查询:
SELECT
flow_timestamp, timeseries.channel_name,
( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL))
FROM UNNEST(timeseries.channel_properties) AS channel_properties
),
data.timestamp ,data.value
,(with subq as (select * from unnest(data.annotation))
select max(if (properties.key = 'margin', properties.value, null))
from (
select * from unnest(subq.properties)
) as properties
) as margin
FROM my_table
left join unnest(timeseries.data) as data
WHERE DATE(flow_timestamp) between "2019-10-28" and "2019-11-02"
order by flow_timestamp
推荐答案
尝试以下
#standardSQL
SELECT
flow_timestamp,
timeseries.channel_name,
( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL))
FROM UNNEST(timeseries.channel_properties) AS channel_properties
) AS number_of_digits,
item.timestamp,
item.value,
( SELECT MAX(IF(prop.key = 'margin', prop.value, NULL))
FROM UNNEST(item.annotation) AS annot, UNNEST(annot.properties) prop
) AS margin
FROM my_table
LEFT JOIN UNNEST(timeseries.data) item
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02'
ORDER BY flow_timestamp
下面是相同解决方案的详细程度稍低的版本,但我通常更喜欢上面的版本,因为它更易于维护
Below is a little less verbose version of the same solution, but I usually prefer above as it simpler to maintain
#standardSQL
SELECT
flow_timestamp,
timeseries.channel_name,
( SELECT MAX(IF(key = 'number_of_digits', value, NULL))
FROM UNNEST(timeseries.channel_properties) AS channel_properties
) AS number_of_digits,
timestamp,
value,
( SELECT MAX(IF(key = 'margin', value, NULL))
FROM UNNEST(annotation), UNNEST(properties)
) AS margin
FROM my_table
LEFT JOIN UNNEST(timeseries.data)
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02'
ORDER BY flow_timestamp
这篇关于在bigquery中枢转多层嵌套字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!