本文介绍了如何在BigQuery中使用线性插值法填充不规则的缺失值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有不规则地缺少值的数据,我想使用BigQuery Standard SQL使用线性插值以一定间隔转换它.
具体来说,我有这样的数据:
#数据不规则丢失+ ------ + ------- +|时间|值|+ ------ + ------- +|1 |3.0 ||5 |5.0 ||7 |1.0 ||9 |8.0 ||10 |4.0 |+ ------ + ------- +
,我想按以下方式转换此表:
#以1的间隔插值+ ------ + -------------------- +|时间|value_interpolated |+ ------ + -------------------- +|1 |3.0 ||2 |3.5 ||3 |4.0 ||4 |4.5 ||5 |5.0 ||6 |3.0 ||7 |1.0 ||8 |4.5 ||9 |8.0 ||10 |4.0 |+ ------ + -------------------- +
有什么聪明的解决方案吗?
补充:此问题类似于
I have data which has missing values irregulaly, and I'd like to convert it with a certain interval with liner interpolation using BigQuery Standard SQL.
Specifically, I have data like this:
# data is missing irregulary
+------+-------+
| time | value |
+------+-------+
| 1 | 3.0 |
| 5 | 5.0 |
| 7 | 1.0 |
| 9 | 8.0 |
| 10 | 4.0 |
+------+-------+
and I'd like to convert this table as follows:
# interpolated with interval of 1
+------+--------------------+
| time | value_interpolated |
+------+--------------------+
| 1 | 3.0 |
| 2 | 3.5 |
| 3 | 4.0 |
| 4 | 4.5 |
| 5 | 5.0 |
| 6 | 3.0 |
| 7 | 1.0 |
| 8 | 4.5 |
| 9 | 8.0 |
| 10 | 4.0 |
+------+--------------------+
Any smart soluton for this?
Supplement: this question is similar to this question in stackoverflow but different in that the data is missing irregulaly.
Thank you.
解决方案
Below is for BigQuery Standard SQL
#standardSQL
select time,
ifnull(value, start_value + (end_value - start_value) / (end_tick - start_tick) * (time - start_tick)) as value_interpolated
from (
select time, value,
first_value(tick ignore nulls) over win1 as start_tick,
first_value(value ignore nulls) over win1 as start_value,
first_value(tick ignore nulls) over win2 as end_tick,
first_value(value ignore nulls) over win2 as end_value,
from (
select time, t.time as tick, value
from (
select generate_array(min(time), max(time)) times
from `project.dataset.table`
), unnest(times) time
left join `project.dataset.table` t
using(time)
)
window win1 as (order by time desc rows between current row and unbounded following),
win2 as (order by time rows between current row and unbounded following)
)
if to apply to sample data from your question - output is
这篇关于如何在BigQuery中使用线性插值法填充不规则的缺失值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!