问题描述
我有一个数据集,其中包含一组不平衡的观察结果,我想在其中使用最新的非缺失字符串向前和向后填充缺失和/或错误"的股票代码观察结果.
id time ticker_have ticker_want------------------------------1 1 ABCDE YYYYY1 2 .YYYY1 3 .YYYY1 4 YYYYY YYYYY1 5 .YYYY------------------------------2 4 .ZZZZZ2 5 ZZZZZ ZZZZZ2 6 .ZZZZZ------------------------------3 1 ..------------------------------4 2 哦哦哦哦哦哦4 3 哦哦哦哦哦哦4 4 哦哦哦哦哦哦
基本上,如果观察已经有一个ticker,但这个ticker与最新的非空ticker不一样,我们用最新的ticker替换这个ticker.
到目前为止,我已经设法使用此代码向前填充缺失的观察
proc sql;create table have as select * from old_have order by id, time desc;放弃;数据要;降温;设置有;通过身份证;/* 保留新变量*/保持温度;长度温度 $5;/* 当 BY-Group 改变时重置 TEMP */如果 first.id 然后 temp='';/* 当 X 非缺失时分配 TEMP */如果代码 ne ' ' 那么 temp=ticker;/* 当X缺失时,将TEMP的保留值赋给X */否则如果ticker=''那么ticker=temp;跑步;
现在我被困在无法使用 last.ticker
或 first.ticker
访问非缺失值的情况...
如何使用 DATA
或 PROC SQL
或任何其他 SAS 命令执行此操作?
您可以通过多种方式完成此操作,但是带有一些嵌套子查询的 proc sql
是一种解决方案.
(从内到外阅读,#1 然后 2 然后 3.如果有帮助,您可以先将每个子查询构建到数据集中)
进程 sql ;创建表想要作为/* #3 - 匹配 id 上的最后一个行情 */选择 a.id、a.time、a.ticker_have、b.ticker_want从有一个左连接/* #2 - id 和最后一个行情 */(选择 x.id, x.ticker_have 作为ticker_want从有 x内部联接/* #1 - 每个 ID 有一个自动收报机的最大时间 */(选择 id, max(time) as mt从有哪里没有丢失(ticker_have)group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id;放弃 ;I have a data set containing an unbalanced panel of observations, where I want to forward and backward fill missing and/or "wrong" observations of ticker with the latest non-missing string.
id time ticker_have ticker_want
------------------------------
1 1 ABCDE YYYYY
1 2 . YYYYY
1 3 . YYYYY
1 4 YYYYY YYYYY
1 5 . YYYYY
------------------------------
2 4 . ZZZZZ
2 5 ZZZZZ ZZZZZ
2 6 . ZZZZZ
------------------------------
3 1 . .
------------------------------
4 2 OOOOO OOOOO
4 3 OOOOO OOOOO
4 4 OOOOO OOOOO
Basically, if the observation already has a ticker, but this ticker is not the same as the latest non-empty ticker, we replace this ticker using the latest ticker.
So far, I have managed to fill missing observations forward using this code
proc sql;
create table have as select * from old_have order by id, time desc;
quit;
data want;
drop temp;
set have;
by id;
/* RETAIN the new variable*/
retain temp; length temp $ 5;
/* Reset TEMP when the BY-Group changes */
if first.id then temp=' ';
/* Assign TEMP when X is non-missing */
if ticker ne ' ' then temp=ticker;
/* When X is missing, assign the retained value of TEMP into X */
else if ticker=' ' then ticker=temp;
run;
Now I am stuck figuring out the cases where I can't access the non-missing value using last.ticker
or first.ticker
...
How would one do this using DATA
or PROC SQL
or any other SAS commands?
You can do this several ways, but proc sql
with some nested sub-queries is one solution.
(Read it from inside out, #1 then 2 then 3. You could build each subquery into a dataset first if it helps)
proc sql ; create table want as /* #3 - match last ticker on id */ select a.id, a.time, a.ticker_have, b.ticker_want from have a left join /* #2 - id and last ticker */ (select x.id, x.ticker_have as ticker_want from have x inner join /* #1 - max time with a ticker per id */ (select id, max(time) as mt from have where not missing(ticker_have) group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id ; quit ;
这篇关于在 panel & 中找到第一个非缺失的 str 值使用值按组(SAS 或 PROC SQL)向前和向后填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!