问题描述
我有一个包含不平衡观察面板的数据集,我想用最新的非缺失字符串向前和向后填充缺失和/或错误"的股票代码观察结果.
id 时间ticker_haveticker_want------------------------------1 1 ABCDE YYYY1 2 .年年年年1 3 .年年年年1 4 年年年年年年年年1 5 .年年年年------------------------------2 4 .ZZZZZ2 5 ZZZZZ ZZZZZ2 6 .ZZZZZ------------------------------3 1 ..------------------------------4 2 喔喔喔喔喔4 3 喔喔喔喔喔4 4 喔喔喔喔喔
基本上,如果观察已经有一个代码,但这个代码与最新的非空代码不同,我们使用最新的代码替换这个代码.
到目前为止,我已经设法使用此代码填补缺失的观察结果
proc sql;create table have as select * from old_have order by id, time desc;退出;数据需要;降温;设置有;按身份证;/* 保留新变量*/保持温度;长度温度 $ 5;/* 当 BY-Group 改变时重置 TEMP */if first.id then temp=' ';/* 当 X 不丢失时分配 TEMP */如果代码 ne ' ' 那么 temp=ticker;/* 当 X 缺失时,将 TEMP 的保留值赋给 X */否则如果ticker=''则ticker=temp;跑;
现在我被困在找出无法使用 last.ticker
或 first.ticker
访问非缺失值的情况...
如何使用 DATA
或 PROC SQL
或任何其他 SAS 命令来做到这一点?
您可以通过多种方式做到这一点,但带有一些嵌套子查询的 proc sql
是一种解决方案.
(从内到外阅读,#1 然后 2 然后 3.如果有帮助,您可以先将每个子查询构建到数据集中)
过程 sql ;创建表想要作为/* #3 - 匹配 id 上的最后一个代码 */选择 a.id、a.time、a.ticker_have、b.ticker_want从有一个左连接/* #2 - id 和最后一个代码 */(选择 x.id, x.ticker_have 作为ticker_want从有 x内部联接/* #1 - 每个 id 的最大时间 */(选择 id, max(time) 作为 mt从有哪里不缺(ticker_have)按 id 分组)作为 x.id = y.id 上的 y 和 x.time = y.mt)作为 a.id = b.id 上的 b;退出 ;I have a data set containing an unbalanced panel of observations, where I want to forward and backward fill missing and/or "wrong" observations of ticker with the latest non-missing string.
id time ticker_have ticker_want
------------------------------
1 1 ABCDE YYYYY
1 2 . YYYYY
1 3 . YYYYY
1 4 YYYYY YYYYY
1 5 . YYYYY
------------------------------
2 4 . ZZZZZ
2 5 ZZZZZ ZZZZZ
2 6 . ZZZZZ
------------------------------
3 1 . .
------------------------------
4 2 OOOOO OOOOO
4 3 OOOOO OOOOO
4 4 OOOOO OOOOO
Basically, if the observation already has a ticker, but this ticker is not the same as the latest non-empty ticker, we replace this ticker using the latest ticker.
So far, I have managed to fill missing observations forward using this code
proc sql;
create table have as select * from old_have order by id, time desc;
quit;
data want;
drop temp;
set have;
by id;
/* RETAIN the new variable*/
retain temp; length temp $ 5;
/* Reset TEMP when the BY-Group changes */
if first.id then temp=' ';
/* Assign TEMP when X is non-missing */
if ticker ne ' ' then temp=ticker;
/* When X is missing, assign the retained value of TEMP into X */
else if ticker=' ' then ticker=temp;
run;
Now I am stuck figuring out the cases where I can't access the non-missing value using last.ticker
or first.ticker
...
How would one do this using DATA
or PROC SQL
or any other SAS commands?
You can do this several ways, but proc sql
with some nested sub-queries is one solution.
(Read it from inside out, #1 then 2 then 3. You could build each subquery into a dataset first if it helps)
proc sql ; create table want as /* #3 - match last ticker on id */ select a.id, a.time, a.ticker_have, b.ticker_want from have a left join /* #2 - id and last ticker */ (select x.id, x.ticker_have as ticker_want from have x inner join /* #1 - max time with a ticker per id */ (select id, max(time) as mt from have where not missing(ticker_have) group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id ; quit ;
这篇关于在面板 & 中查找第一个非缺失 str 值使用值按组转发和回填(SAS 或 PROC SQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!