问题描述
我有一个数据集,我想用最接近的非缺失值填充缺失值.在此的答案中,我找到了两种优雅的解决方案一个>问题,但我不明白他们为什么不为我工作.
I have a dataset in which I want to fill missing values witht the closest non-missing value. I found two elegant solutions in the answers to this question, but I don't understand why they are not working for me.
表格:
create table Tab1(data date, V1 number);
insert into Tab1 values (date '2000-01-01', 1);
insert into Tab1 values (date '2000-02-01', 1);
insert into Tab1 values (date '2000-03-01', 1);
insert into Tab1 values (date '2000-04-01', 1);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
insert into Tab1 values (date '2000-03-01', 2);
insert into Tab1 values (date '2000-04-01', 2);
insert into Tab1 values (date '2000-05-01', NULL);
insert into Tab1 values (date '2000-06-01', NULL);
select * from Tab1;
DATA V1
2000-01-01 1
2000-02-01 1
2000-03-01 1
2000-04-01 1
2000-05-01
2000-06-01
2000-03-01 2
2000-04-01 2
2000-05-01
2000-06-01
尝试#1:
select A.*,
(case when V1 is null then lag(V1 ignore nulls)
over (partition by V1 order by V1, data)
else V1
end) V2
from Tab1 A;
尝试#2:
select A.*,
(case when V1 is null
then last_value(V1 ignore nulls)
over (partition by V1 order by data
range between unbounded preceding and 1 preceding)
else V1
end) V2
from Tab1 A;
两者都给我同样的不良结果:
Both give me the same unwanted result:
DATA V1 V2
2000-01-01 1 1
2000-02-01 1 1
2000-03-01 1 1
2000-04-01 1 1
2000-03-01 2 2
2000-04-01 2 2
2000-05-01
2000-05-01
2000-06-01
我在做什么错了?
推荐答案
您的第一个版本应该可以工作,但要稍作调整:
Your first version should work, with a slight tweak:
select A.*,
coalesce(V1, lag(V1 ignore nulls) over (order by data)) V2
from Tab1 A;
调整是从lag()
中删除partition by v1
. coalesce()
只是我偏爱简单的表达式.
The tweak is to remove the partition by v1
from the lag()
. The coalesce()
is just my preference for simpler expressions.
相同的调整也应适用于第二个版本.
The same tweak should work for the second version as well.
您的版本不起作用,因为lag()
值必须来自同一分区(或为null
).当您拥有partition by v1
时,实际上是在确保v1
具有与当前行相同的值.
Your version doesn't work because the lag()
value must come from the same partition (or be null
). When you have partition by v1
, you are actually ensuring that v1
has the same value as in the current row.
这篇关于Oracle SQL,使用最接近的非缺失值来填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!