问题描述
所以我有一张表,每天都有不同的日期和不同的时间,并且客户端 ID 每天可以出现多次.
So I have a table that has different days and didfferent hours in each day, and the client ID can appear multiple times per day.
data Data1;
infile datalines delimiter=',';
input date :ddmmyy10. ID $ time :time8. $ ;
format date ddmmyy10.;
format time time8.;
datalines;
05/11/2020,1000,8:15:23
05/11/2020,1000,8:20:10
05/11/2020,1001,8:21:10
05/11/2020,1001,9:05:15
05/11/2020,1001,10:30:20
06/11/2020,1002,8:26:10
06/11/2020,1003,8:27:10
06/11/2020,1003,9:40:01
;
我想输出另一个名为Attempts"的列.看起来像这样:
I want to output another column that is called "Attempts" which would look like this:
data Data1;
infile datalines delimiter=',';
input date :ddmmyy10. ID $ time :time8. $ Attempt;
format date ddmmyy10.;
format time time8.;
datalines;
05/11/2020,1000,8:15:23,1
05/11/2020,1000,8:20:10,2
05/11/2020,1001,8:21:10,1
05/11/2020,1001,9:05:15,2
05/11/2020,1001,10:30:20,3
06/11/2020,1002,8:26:10,1
06/11/2020,1003,8:27:10,1
06/11/2020,1003,9:40:01,2
06/11/2020,1000,10:20:10,1
06/11/2020,1000,11:20:10,2
06/11/2020,1000,12:20:10,3
06/11/2020,1000,13:20:10,3
;
如您所见,客户 1000 在 05/11 日出现了 2 次,在 06/11 出现了 4 次,客户 1001 在 05/11 出现了 3 次,客户 1002 在 06/11 出现了一次,客户 1003 出现了 2时间为 06/11.
As you can see, client 1000 appears two times on day 05/11 and 4 times on 06/11, customer 1001 appears 3 imes on 05/11, customer 1002 appears one time on 06/11 and customer 1003 appears 2 times on 06/11.
我也希望 3 是最大值,因此如果客户一天出现超过 3 次(例如 06/11 的客户 1000),则该值为 3.
I also would like that 3 is the maximum, so if a client appears more than 3 times on a day (like customer 1000 on 06/11) the value would be 3.
我不介意使用 sas 语言或 proc sql,所以如果有人对此有任何想法,请告诉我
I don't mind using sas language or proc sql, so if anyone has an idea in either, please let me know
推荐答案
只需使用 BY 组处理和保留变量即可.您可以使用 MIN() 函数将计数器上限设置为 3.
Just use BY group processing and a retained variable. You can use the MIN() function to cap the counter at 3.
data Data1;
infile datalines dsd;
input date :ddmmyy. ID $ time :time. expect;
format date ddmmyy10. time time8.;
datalines;
05/11/2020,1000,8:15:23,1
05/11/2020,1000,8:20:10,2
05/11/2020,1001,8:21:10,1
05/11/2020,1001,9:05:15,2
05/11/2020,1001,10:30:20,3
06/11/2020,1000,10:20:10,1
06/11/2020,1000,11:20:10,2
06/11/2020,1000,12:20:10,3
06/11/2020,1000,13:20:10,3
06/11/2020,1002,8:26:10,1
06/11/2020,1003,8:27:10,1
06/11/2020,1003,9:40:01,2
;
data want;
set data1;
by date id ;
retain attempt;
if first.id then attempt=1;
else attempt=min(3,attempt+1);
run;
注意:我对您的示例数据进行了重新排序,以避免需要添加 PROC SORT 步骤.如果您的真实数据集已分组但未排序,如您的示例所示,您可以在 BY 语句中添加 NOTSORTED 关键字,逻辑将起作用.
Note: I reordered your sample data to avoid needing to add a PROC SORT step. If your real dataset is grouped but not sorted, as in your example, you can add the NOTSORTED keyword on the BY statement and the logic will work.
这篇关于创建一个变量,逐步计算一个 ID 在一天内重复的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!