本文介绍了根据按特定日期长度的情节分组的行将新列添加到data.frame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! df = read.table(text = 'ID Day Count Count_sum33021 9535 3 2933029 9535 3 2934001 9535 3 2932010 9534 2 2933023 9534 2 2945012 9533 4 2947001 9533 4 2948010 9533 4 2950001 9533 4 2949004 9532 1 299002 9531 2 2967008 9531 2 2940011 9530 1 2942003 9529 2 2942011 9529 2 2955023 9528 1 2940012 9527 3 2943007 9527 3 2947011 9527 3 2952004 9526 4 2952005 9526 4 2952006 9526 4 2952007 9526 4 2919001 9525 1 2957008 9524 5 2957010 9524 5 2958006 9524 5 2958008 9524 5 2959001 9524 5 2958008 9537 3 2766001 9537 3 2768001 9537 3 2754057 9536 1 2733021 9535 3 2733029 9535 3 2734001 9535 3 2732010 9534 2 2733023 9534 2 2732010 9534 2 2733023 9534 2 2745012 9533 4 2747001 9533 4 2748010 9533 4 2750001 9533 4 2745012 9533 4 2747001 9533 4 2748010 9533 4 2750001 9533 4 2749004 9532 1 2749004 9532 1 279002 9531 2 2767008 9531 2 279002 9531 2 2767008 9531 2 2740011 9530 1 2740011 9530 1 2742003 9529 2 2742011 9529 2 2742003 9529 2 2742011 9529 2 2755023 9528 1 2755023 9528 1 2740012 9527 3 2743007 9527 3 2747011 9527 3 2740012 9527 3 2743007 9527 3 2747011 9527 3 2752004 9526 4 2752005 9526 4 2752006 9526 4 2752007 9526 4 2752004 9526 4 2752005 9526 4 2752006 9526 4 2752007 9526 4 2719001 9525 1 2757008 9524 5 2757010 9524 5 2758006 9524 5 2758008 9524 5 2759001 9524 5 2765004 9523 1 2749004 9532 1 269002 9531 2 2667008 9531 2 2640011 9530 1 2642003 9529 2 2642011 9529 2 2655023 9528 1 2640012 9527 3 2643007 9527 3 2647011 9527 3 2652004 9526 4 2652005 9526 4 2652006 9526 4 2652007 9526 4 2619001 9525 1 2657008 9524 5 2657010 9524 5 2658006 9524 5 2658008 9524 5 2659001 9524 5 2665004 9523 1 2675003 9522 1 2676007 9521 4 2677002 9521 4 2677003 9521 4 2678003 9521 4 2648007 9538 2 2548011 9538 2 2558008 9537 3 2566001 9537 3 2568001 9537 3 2554057 9536 1 2554057 9536 1 2533021 9535 3 2533029 9535 3 2534001 9535 3 2533021 9535 3 2533029 9535 3 2534001 9535 3 2532010 9534 2 2533023 9534 2 2532010 9534 2 2533023 9534 2 2545012 9533 4 2547001 9533 4 2548010 9533 4 2550001 9533 4 2545012 9533 4 2547001 9533 4 2548010 9533 4 2550001 9533 4 2545012 9533 4 2547001 9533 4 2548010 9533 4 2550001 9533 4 2549004 9532 1 2549004 9532 1 2549004 9532 1 259002 9531 2 2567008 9531 2 259002 9531 2 2567008 9531 2 259002 9531 2 2567008 9531 2 259002 9531 2 2567008 9531 2 2540011 9530 1 2540011 9530 1 2540011 9530 1 2540011 9530 1 2542003 9529 2 2542011 9529 2 2542003 9529 2 2542011 9529 2 2542003 9529 2 2542011 9529 2 2542003 9529 2 2542011 9529 2 2555023 9528 1 2555023 9528 1 2555023 9528 1 2555023 9528 1 2540012 9527 3 2543007 9527 3 2547011 9527 3 2540012 9527 3 2543007 9527 3 2547011 9527 3 2540012 9527 3 2543007 9527 3 2547011 9527 3 2540012 9527 3 2543007 9527 3 2547011 9527 3 2552004 9526 4 2552005 9526 4 2552006 9526 4 2552007 9526 4 2552004 9526 4 2552005 9526 4 2552006 9526 4 2552007 9526 4 2552004 9526 4 2552005 9526 4 2552006 9526 4 2552007 9526 4 2519001 9525 1 2519001 9525 1 2519001 9525 1 2557008 9524 5 2557010 9524 5 2558006 9524 5 2558008 9524 5 2559001 9524 5 2557008 9524 5 2557010 9524 5 2558006 9524 5 2558008 9524 5 2559001 9524 5 2565004 9523 1 2565004 9523 1 2575003 9522 1 2575003 9522 1 2576007 9521 4 2577002 9521 4 2577003 9521 4 2578003 9521 4 2574001 9520 1 2539093 9539 2 2441006 9539 2 2448007 9538 2 2448011 9538 2 2458008 9537 3 2466001 9537 3 2468001 9537 3 2454057 9536 1 2433021 9535 3 2433029 9535 3 2434001 9535 3 2432010 9534 2 2433023 9534 2 2445012 9533 4 2447001 9533 4 2448010 9533 4 2450001 9533 4 2449004 9532 1 249002 9531 2 2467008 9531 2 2440011 9530 1 2440011 9530 1 2442003 9529 2 2442011 9529 2 2442003 9529 2 2442011 9529 2 2442003 9529 2 2442011 9529 2 2455023 9528 1 2455023 9528 1 2455023 9528 1 2440012 9527 3 2443007 9527 3 2447011 9527 3 2440012 9527 3 2443007 9527 3 2447011 9527 3 2452004 9526 4 2452005 9526 4 2452006 9526 4 2452007 9526 4 2452004 9526 4 2452005 9526 4 2452006 9526 4 2452007 9526 4 2419001 9525 1 2419001 9525 1 2457008 9524 5 2457010 9524 5 2458006 9524 5 2458008 9524 5 2459001 9524 5 2457008 9524 5 2457010 9524 5 2458006 9524 5 2458008 9524 5 2459001 9524 5 2465004 9523 1 2465004 9523 1 2475003 9522 1 2475003 9522 1 2476007 9521 4 2477002 9521 4 2477003 9521 4 2478003 9521 4 2476007 9521 4 2477002 9521 4 2477003 9521 4 2478003 9521 4 2474001 9520 1 2474001 9520 1 2433021 9518 1 2455023 9528 1 2240012 9527 3 2243007 9527 3 2247011 9527 3 2252004 9526 4 2252005 9526 4 2252006 9526 4 2252007 9526 4 2219001 9525 1 2257008 9524 5 2257010 9524 5 2258006 9524 5 2258008 9524 5 2259001 9524 5 2265004 9523 1 2275003 9522 1 2276007 9521 4 2277002 9521 4 2277003 9521 4 2278003 9521 4 2274001 9520 1 2233021 9518 1 2240012 9527 3 2143007 9527 3 2147011 9527 3 2152004 9526 4 2152005 9526 4 2152006 9526 4 2152007 9526 4 2119001 9525 1 2157008 9524 5 2157010 9524 5 2158006 9524 5 2158008 9524 5 2159001 9524 5 2165004 9523 1 2175003 9522 1 2176007 9521 4 2177002 9521 4 2177003 9521 4 2178003 9521 4 2174001 9520 1 2133021 9518 1 2152004 9526 4 1852005 9526 4 1852006 9526 4 1852007 9526 4 1819001 9525 1 1857008 9524 5 1857010 9524 5 1858006 9524 5 1858008 9524 5 1859001 9524 5 1865004 9523 1 1875003 9522 1 1876007 9521 4 1877002 9521 4 1877003 9521 4 1878003 9521 4 1874001 9520 1 1833021 9518 1 1819001 9525 1 1457008 9524 5 1457010 9524 5 1458006 9524 5 1458008 9524 5 1459001 9524 5 1465004 9523 1 1475003 9522 1 1476007 9521 4 1477002 9521 4 1477003 9521 4 1478003 9521 4 1474001 9520 1 1433021 9518 1 1457008 9524 5 1357010 9524 5 1358006 9524 5 1358008 9524 5 1359001 9524 5 1365004 9523 1 1375003 9522 1 1376007 9521 4 1377002 9521 4 1377003 9521 4 1378003 9521 4 1374001 9520 1 1333021 9518 1 1365004 9523 1 875003 9522 1 876007 9521 4 877002 9521 4 877003 9521 4 878003 9521 4 874001 9520 1 833021 9518 1 875003 9522 1 776007 9521 4 777002 9521 4 777003 9521 4 778003 9521 4 774001 9520 1 733021 9518 1 776007 9521 4 677002 9521 4 677003 9521 4 678003 9521 4 674001 9520 1 633021 9518 1 674001 9520 1 233021 9518 1 233021 9518 1 1', header = TRUE) Day 列显示天; Count 列显示该特定日期ID的总和; Count_sum 列显示 ID 以12天为单位,即Day + Day -1 + Day -2 + Day -3 + Day -4 + Day -5 + Day -6 + Day -7 + Day -8 + Day -9 + -10天+第-11天。Day column shows days;Count column shows the sum of ID in that particular Day;Count_sum column shows the sum of ID by blocks of 12 days, i.e. Day + Day -1 + Day -2 + Day -3 + Day -4 + Day -5 + Day -6 + Day -7 + Day -8 + Day -9 + Day -10 + Day -11.例如 1)Count_sum = 29,因为它代表3(第9535天)+ 2(第9534天)+ 4(第9533天)+ 1(第9532天)+ 2(第9531天)+ 1(第9530天)的总和+ 2(第9529天)+ 1(第9528天)+ 3(第9527天)+ 4(第9526天)+ 1(第9525天)+ 5(第9524天);e.g.1) Count_sum = 29 because it represents the sum of 3 (Day 9535) + 2 (Day 9534) + 4 (Day 9533) + 1 (Day 9532) + 2 (Day 9531) + 1 (Day 9530) + 2 (Day 9529) + 1 (Day 9528) + 3 (Day 9527) + 4 (Day 9526) + 1 (Day 9525) + 5 (Day 9524); 2)Count_sum = 27,因为3(第9537天)+ 1(第9536天)+ 3(第9535天)+ 2(第9534天)+ 4(第9533天)+ 1(第9532天)+ 2(第9531天) + 1(第9530天)+ 2第9529天)+1(第9528天)+3(第9527天)+ 4(第9526天);2) Count_sum = 27 because of 3 (Day 9537) + 1 (Day 9536) + 3 (Day 9535) + 2 (Day 9534) + 4 (Day 9533) + 1 (Day 9532) + 2 (Day 9531) + 1 (Day 9530) + 2 Day 9529) + 1 (Day 9528) + 3 (Day 9527) + 4 (Day 9526);等,依此类推。我需要做的是向 df 中添加第5列(Episode_ID),每列12天单集的值介于1到21之间(因为 df 有21天是不重复的天)。What I need to do is to add a 5th column (Episode_ID) to df which groups each 12-day episodes with a unique value from 1 to 21 (because in df there are 21 unique days). Count_sum近正确地将它们分组,但是可能有2个或更多的12天情节具有相同的Count_sum值,并且在几天之内也可能重叠。Count_sum nearly group them correctly but there can be 2 or more 12-day episodes with same Count_sum values and that can also overlap within days.我的真实data.frame包含> 300,000行而且我还想获得一个适用于12天情节的代码(例如 df ),也适用于其他数据。按2、3、4、5、6、7、8,n天分组的帧。My real data.frame contains >300,000 rows and I would like also to obtain a code that works for 12-day episodes (as df) but also for other data.frames grouped by 2,3,4,5,6,7,8,n days.这是我对 df (12天的情节)的预期输出:Here my expected output for df (12-day episodes):ID Day Count Count_sum Episode_ID33021 9535 3 29 133029 9535 3 29 134001 9535 3 29 132010 9534 2 29 133023 9534 2 29 145012 9533 4 29 147001 9533 4 29 148010 9533 4 29 150001 9533 4 29 149004 9532 1 29 19002 9531 2 29 167008 9531 2 29 140011 9530 1 29 142003 9529 2 29 142011 9529 2 29 155023 9528 1 29 140012 9527 3 29 143007 9527 3 29 147011 9527 3 29 152004 9526 4 29 152005 9526 4 29 152006 9526 4 29 152007 9526 4 29 119001 9525 1 29 157008 9524 5 29 157010 9524 5 29 158006 9524 5 29 158008 9524 5 29 159001 9524 5 29 158008 9537 3 27 266001 9537 3 27 268001 9537 3 27 254057 9536 1 27 233021 9535 3 27 233029 9535 3 27 234001 9535 3 27 232010 9534 2 27 233023 9534 2 27 245012 9533 4 27 247001 9533 4 27 248010 9533 4 27 250001 9533 4 27 249004 9532 1 27 29002 9531 2 27 267008 9531 2 27 240011 9530 1 27 242003 9529 2 27 242011 9529 2 27 255023 9528 1 27 240012 9527 3 27 243007 9527 3 27 247011 9527 3 27 252004 9526 4 27 252005 9526 4 27 252006 9526 4 27 252007 9526 4 27 232010 9534 2 27 333023 9534 2 27 345012 9533 4 27 347001 9533 4 27 348010 9533 4 27 350001 9533 4 27 349004 9532 1 27 39002 9531 2 27 367008 9531 2 27 340011 9530 1 27 342003 9529 2 27 342011 9529 2 27 355023 9528 1 27 340012 9527 3 27 343007 9527 3 27 347011 9527 3 27 352004 9526 4 27 352005 9526 4 27 352006 9526 4 27 352007 9526 4 27 319001 9525 1 27 357008 9524 5 27 357010 9524 5 27 358006 9524 5 27 358008 9524 5 27 359001 9524 5 27 365004 9523 1 27 349004 9532 1 26 49002 9531 2 26 467008 9531 2 26 440011 9530 1 26 442003 9529 2 26 442011 9529 2 26 455023 9528 1 26 440012 9527 3 26 443007 9527 3 26 447011 9527 3 26 452004 9526 4 26 452005 9526 4 26 452006 9526 4 26 452007 9526 4 26 419001 9525 1 26 457008 9524 5 26 457010 9524 5 26 458006 9524 5 26 458008 9524 5 26 459001 9524 5 26 465004 9523 1 26 475003 9522 1 26 476007 9521 4 26 477002 9521 4 26 477003 9521 4 26 478003 9521 4 26 448007 9538 2 25 548011 9538 2 25 558008 9537 3 25 566001 9537 3 25 568001 9537 3 25 554057 9536 1 25 533021 9535 3 25 533029 9535 3 25 534001 9535 3 25 532010 9534 2 25 533023 9534 2 25 545012 9533 4 25 547001 9533 4 25 548010 9533 4 25 550001 9533 4 25 549004 9532 1 25 59002 9531 2 25 567008 9531 2 25 540011 9530 1 25 542003 9529 2 25 542011 9529 2 25 555023 9528 1 25 540012 9527 3 25 543007 9527 3 25 547011 9527 3 25 554057 9536 1 25 633021 9535 3 25 633029 9535 3 25 634001 9535 3 25 632010 9534 2 25 633023 9534 2 25 645012 9533 4 25 647001 9533 4 25 648010 9533 4 25 650001 9533 4 25 649004 9532 1 25 69002 9531 2 25 667008 9531 2 25 640011 9530 1 25 642003 9529 2 25 642011 9529 2 25 655023 9528 1 25 640012 9527 3 25 643007 9527 3 25 647011 9527 3 25 652004 9526 4 25 652005 9526 4 25 652006 9526 4 25 652007 9526 4 25 619001 9525 1 25 645012 9533 4 25 747001 9533 4 25 748010 9533 4 25 750001 9533 4 25 749004 9532 1 25 79002 9531 2 25 767008 9531 2 25 740011 9530 1 25 742003 9529 2 25 742011 9529 2 25 755023 9528 1 25 740012 9527 3 25 743007 9527 3 25 747011 9527 3 25 752004 9526 4 25 752005 9526 4 25 752006 9526 4 25 752007 9526 4 25 719001 9525 1 25 757008 9524 5 25 757010 9524 5 25 758006 9524 5 25 758008 9524 5 25 759001 9524 5 25 765004 9523 1 25 775003 9522 1 25 79002 9531 2 25 867008 9531 2 25 840011 9530 1 25 842003 9529 2 25 842011 9529 2 25 855023 9528 1 25 840012 9527 3 25 843007 9527 3 25 847011 9527 3 25 852004 9526 4 25 852005 9526 4 25 852006 9526 4 25 852007 9526 4 25 819001 9525 1 25 857008 9524 5 25 857010 9524 5 25 858006 9524 5 25 858008 9524 5 25 859001 9524 5 25 865004 9523 1 25 875003 9522 1 25 876007 9521 4 25 877002 9521 4 25 877003 9521 4 25 878003 9521 4 25 874001 9520 1 25 839093 9539 2 24 941006 9539 2 24 948007 9538 2 24 948011 9538 2 24 958008 9537 3 24 966001 9537 3 24 968001 9537 3 24 954057 9536 1 24 933021 9535 3 24 933029 9535 3 24 934001 9535 3 24 932010 9534 2 24 933023 9534 2 24 945012 9533 4 24 947001 9533 4 24 948010 9533 4 24 950001 9533 4 24 949004 9532 1 24 99002 9531 2 24 967008 9531 2 24 940011 9530 1 24 942003 9529 2 24 942011 9529 2 24 955023 9528 1 24 940011 9530 1 24 1042003 9529 2 24 1042011 9529 2 24 1055023 9528 1 24 1040012 9527 3 24 1043007 9527 3 24 1047011 9527 3 24 1052004 9526 4 24 1052005 9526 4 24 1052006 9526 4 24 1052007 9526 4 24 1019001 9525 1 24 1057008 9524 5 24 1057010 9524 5 24 1058006 9524 5 24 1058008 9524 5 24 1059001 9524 5 24 1065004 9523 1 24 1075003 9522 1 24 1076007 9521 4 24 1077002 9521 4 24 1077003 9521 4 24 1078003 9521 4 24 1074001 9520 1 24 1042003 9529 2 24 1142011 9529 2 24 1155023 9528 1 24 1140012 9527 3 24 1143007 9527 3 24 1147011 9527 3 24 1152004 9526 4 24 1152005 9526 4 24 1152006 9526 4 24 1152007 9526 4 24 1119001 9525 1 24 1157008 9524 5 24 1157010 9524 5 24 1158006 9524 5 24 1158008 9524 5 24 1159001 9524 5 24 1165004 9523 1 24 1175003 9522 1 24 1176007 9521 4 24 1177002 9521 4 24 1177003 9521 4 24 1178003 9521 4 24 1174001 9520 1 24 1133021 9518 1 24 1155023 9528 1 22 1240012 9527 3 22 1243007 9527 3 22 1247011 9527 3 22 1252004 9526 4 22 1252005 9526 4 22 1252006 9526 4 22 1252007 9526 4 22 1219001 9525 1 22 1257008 9524 5 22 1257010 9524 5 22 1258006 9524 5 22 1258008 9524 5 22 1259001 9524 5 22 1265004 9523 1 22 1275003 9522 1 22 1276007 9521 4 22 1277002 9521 4 22 1277003 9521 4 22 1278003 9521 4 22 1274001 9520 1 22 1233021 9518 1 22 1240012 9527 3 21 1343007 9527 3 21 1347011 9527 3 21 1352004 9526 4 21 1352005 9526 4 21 1352006 9526 4 21 1352007 9526 4 21 1319001 9525 1 21 1357008 9524 5 21 1357010 9524 5 21 1358006 9524 5 21 1358008 9524 5 21 1359001 9524 5 21 1365004 9523 1 21 1375003 9522 1 21 1376007 9521 4 21 1377002 9521 4 21 1377003 9521 4 21 1378003 9521 4 21 1374001 9520 1 21 1333021 9518 1 21 1352004 9526 4 18 1452005 9526 4 18 1452006 9526 4 18 1452007 9526 4 18 1419001 9525 1 18 1457008 9524 5 18 1457010 9524 5 18 1458006 9524 5 18 1458008 9524 5 18 1459001 9524 5 18 1465004 9523 1 18 1475003 9522 1 18 1476007 9521 4 18 1477002 9521 4 18 1477003 9521 4 18 1478003 9521 4 18 1474001 9520 1 18 1433021 9518 1 18 1419001 9525 1 14 1557008 9524 5 14 1557010 9524 5 14 1558006 9524 5 14 1558008 9524 5 14 1559001 9524 5 14 1565004 9523 1 14 1575003 9522 1 14 1576007 9521 4 14 1577002 9521 4 14 1577003 9521 4 14 1578003 9521 4 14 1574001 9520 1 14 1533021 9518 1 14 1557008 9524 5 13 1657010 9524 5 13 1658006 9524 5 13 1658008 9524 5 13 1659001 9524 5 13 1665004 9523 1 13 1675003 9522 1 13 1676007 9521 4 13 1677002 9521 4 13 1677003 9521 4 13 1678003 9521 4 13 1674001 9520 1 13 1633021 9518 1 13 1665004 9523 1 8 1775003 9522 1 8 1776007 9521 4 8 1777002 9521 4 8 1777003 9521 4 8 1778003 9521 4 8 1774001 9520 1 8 1733021 9518 1 8 1775003 9522 1 7 1876007 9521 4 7 1877002 9521 4 7 1877003 9521 4 7 1878003 9521 4 7 1874001 9520 1 7 1833021 9518 1 7 1876007 9521 4 6 1977002 9521 4 6 1977003 9521 4 6 1978003 9521 4 6 1974001 9520 1 6 1933021 9518 1 6 1974001 9520 1 2 2033021 9518 1 2 2033021 9518 1 1 21如果看到输出,则在 Count_sum = 27内有2个不同的情节,对于 Count_sum = 25, Count_sum = 24等2集。If you see the output, within Count_sum = 27 there are 2 distinct episodes, 4 episodes for Count_sum = 25, 2 episodes for Count_sum = 24, etc.. Episode_ID列从1到21其中1是具有最大Count_group的情节,而当有2个或更多情节时是同一Count_group,需要按Day减少= TRUE进行排序。The Episode_ID column starts from 1 to 21 where 1 is the episode with largest Count_group and when there are 2 or more episode with same Count_group they need to be ordered by Day decreasing = TRUE.这是我从(更新)基于两列向data.frame添加索引列,但不起作用:Here what I tried from (Update) Add index column to data.frame based on two columns but does not work: 1)df$Episode_ID <- cumsum(c(1,abs(diff(df$Day)) > 1) + c(0,diff(df$Count_sum) != 0) > 0) 2)library(data.table)Episode_ID <-setDT(df)[, if(Count_sum[1L] < .N) ((seq_len(.N)-1) %/% Count_sum[1L])+1 else as.numeric(Count_sum), rleid(Count_sum)][, rleid(V1)]df = df[, Episode_ID := Episode_ID]有任何建议吗?推荐答案我必须承认我还没有完全理解所有细节,尤其是没有明确定义 episode ,所提供的数据在我看来与 Count_sum 是计算出来的。I must admit I haven't fully understood all the details, in particular, there is no explicit definition of episode and the provided data appear to me as not fully matching the description of how Count_sum is computed.不过,我能够重现预期的结果。Nevertheless, I was able to reproduce the expected results.建议的解决方案基于以下观察结果: Day 由许多单调递减的序列组成(大概是OP用 episodes 表示的意思)。因此,任务是识别新序列开始处的中断,增加序列计数器并使用该序列ID为该序列的所有后续行编号。The proposed solution is based on the observation that Day is consisting of a number of monotonically decreasing sequences (which presumably is what the OP means by episodes). So, the task is to identify the breaks where a new sequence starts, to advance the sequence counter, and to number all subsequent rows of that sequence with that sequence id.这是通过library(data.table) # CRAN version 1.10.4 usedsetDT(expected)[, Sequence_ID := cumsum(Day - shift(Day, fill = -1L) > 0)]请注意,此处由OP提供的第二个数据集用于证明计算与预期结果一致。例如,第一次休息发生在第29行和第30行之间:Note that here the second data set provided by the OP is used to demonstrate that the computations are in line with the expected results. For instance, the first break happens between rows 29 and 30:expected[28:31]# ID Day Count Count_sum Episode_ID Sequence_ID#1: 58008 9524 5 29 1 1#2: 59001 9524 5 29 1 1#3: 58008 9537 3 27 2 2#4: 66001 9537 3 27 2 2表达式已识别出从9524天到9537天的跳跃。The expression has recognized the jump from day 9524 to 9537.不幸的是,结尾处存在差异:Unfortunately, there is a discrepancy at the end:tail(expected, 11)# ID Day Count Count_sum Episode_ID Sequence_ID# 1: 74001 9520 1 7 18 18# 2: 33021 9518 1 7 18 18# 3: 76007 9521 4 6 19 19# 4: 77002 9521 4 6 19 19# 5: 77003 9521 4 6 19 19# 6: 78003 9521 4 6 19 19# 7: 74001 9520 1 6 19 19# 8: 33021 9518 1 6 19 19# 9: 74001 9520 1 2 20 20#10: 33021 9518 1 2 20 20#11: 33021 9518 1 1 21 20 OP具有将最后一行分配给新剧集,尽管日子依旧单调递减。如果这仅是提供的数据中的错误,请完成。The OP has assigned the last row to a new episode although the days are still in monotonically decreasing order. If this is just an error in the provided data, we are done.如果这是故意的,则必须在使用 Count_sum 的变化c $ c> data.table 方便的 rleid()函数:If this is intentional, then changes in Count_sum have to be considered in the numbering of episodes by using data.table's handy rleid() function:expected[, new_Episode_ID := rleid(Sequence_ID, Count_sum)]tail(expected, 5L)# ID Day Count Count_sum Episode_ID Sequence_ID new_Episode_ID#1: 74001 9520 1 6 19 19 19#2: 33021 9518 1 6 19 19 19#3: 74001 9520 1 2 20 20 20#4: 33021 9518 1 2 20 20 20#5: 33021 9518 1 1 21 20 21这也可以写得更简洁作为单线This can also be written more concisely as a one-linerexpected[, new_Episode_ID := rleid(cumsum(Day - shift(Day, fill = -1L) > 0), Count_sum)] 数据Dataexpected <- structure(list(ID = c(33021L, 33029L, 34001L, 32010L, 33023L,45012L, 47001L, 48010L, 50001L, 49004L, 9002L, 67008L, 40011L,42003L, 42011L, 55023L, 40012L, 43007L, 47011L, 52004L, 52005L,52006L, 52007L, 19001L, 57008L, 57010L, 58006L, 58008L, 59001L,58008L, 66001L, 68001L, 54057L, 33021L, 33029L, 34001L, 32010L,33023L, 45012L, 47001L, 48010L, 50001L, 49004L, 9002L, 67008L,40011L, 42003L, 42011L, 55023L, 40012L, 43007L, 47011L, 52004L,52005L, 52006L, 52007L, 32010L, 33023L, 45012L, 47001L, 48010L,50001L, 49004L, 9002L, 67008L, 40011L, 42003L, 42011L, 55023L,40012L, 43007L, 47011L, 52004L, 52005L, 52006L, 52007L, 19001L,57008L, 57010L, 58006L, 58008L, 59001L, 65004L, 49004L, 9002L,67008L, 40011L, 42003L, 42011L, 55023L, 40012L, 43007L, 47011L,52004L, 52005L, 52006L, 52007L, 19001L, 57008L, 57010L, 58006L,58008L, 59001L, 65004L, 75003L, 76007L, 77002L, 77003L, 78003L,48007L, 48011L, 58008L, 66001L, 68001L, 54057L, 33021L, 33029L,34001L, 32010L, 33023L, 45012L, 47001L, 48010L, 50001L, 49004L,9002L, 67008L, 40011L, 42003L, 42011L, 55023L, 40012L, 43007L,47011L, 54057L, 33021L, 33029L, 34001L, 32010L, 33023L, 45012L,47001L, 48010L, 50001L, 49004L, 9002L, 67008L, 40011L, 42003L,42011L, 55023L, 40012L, 43007L, 47011L, 52004L, 52005L, 52006L,52007L, 19001L, 45012L, 47001L, 48010L, 50001L, 49004L, 9002L,67008L, 40011L, 42003L, 42011L, 55023L, 40012L, 43007L, 47011L,52004L, 52005L, 52006L, 52007L, 19001L, 57008L, 57010L, 58006L,58008L, 59001L, 65004L, 75003L, 9002L, 67008L, 40011L, 42003L,42011L, 55023L, 40012L, 43007L, 47011L, 52004L, 52005L, 52006L,52007L, 19001L, 57008L, 57010L, 58006L, 58008L, 59001L, 65004L,75003L, 76007L, 77002L, 77003L, 78003L, 74001L, 39093L, 41006L,48007L, 48011L, 58008L, 66001L, 68001L, 54057L, 33021L, 33029L,34001L, 32010L, 33023L, 45012L, 47001L, 48010L, 50001L, 49004L,9002L, 67008L, 40011L, 42003L, 42011L, 55023L, 40011L, 42003L,42011L, 55023L, 40012L, 43007L, 47011L, 52004L, 52005L, 52006L,52007L, 19001L, 57008L, 57010L, 58006L, 58008L, 59001L, 65004L,75003L, 76007L, 77002L, 77003L, 78003L, 74001L, 42003L, 42011L,55023L, 40012L, 43007L, 47011L, 52004L, 52005L, 52006L, 52007L,19001L, 57008L, 57010L, 58006L, 58008L, 59001L, 65004L, 75003L,76007L, 77002L, 77003L, 78003L, 74001L, 33021L, 55023L, 40012L,43007L, 47011L, 52004L, 52005L, 52006L, 52007L, 19001L, 57008L,57010L, 58006L, 58008L, 59001L, 65004L, 75003L, 76007L, 77002L,77003L, 78003L, 74001L, 33021L, 40012L, 43007L, 47011L, 52004L,52005L, 52006L, 52007L, 19001L, 57008L, 57010L, 58006L, 58008L,59001L, 65004L, 75003L, 76007L, 77002L, 77003L, 78003L, 74001L,33021L, 52004L, 52005L, 52006L, 52007L, 19001L, 57008L, 57010L,58006L, 58008L, 59001L, 65004L, 75003L, 76007L, 77002L, 77003L,78003L, 74001L, 33021L, 19001L, 57008L, 57010L, 58006L, 58008L,59001L, 65004L, 75003L, 76007L, 77002L, 77003L, 78003L, 74001L,33021L, 57008L, 57010L, 58006L, 58008L, 59001L, 65004L, 75003L,76007L, 77002L, 77003L, 78003L, 74001L, 33021L, 65004L, 75003L,76007L, 77002L, 77003L, 78003L, 74001L, 33021L, 75003L, 76007L,77002L, 77003L, 78003L, 74001L, 33021L, 76007L, 77002L, 77003L,78003L, 74001L, 33021L, 74001L, 33021L, 33021L), Day = c(9535L,9535L, 9535L, 9534L, 9534L, 9533L, 9533L, 9533L, 9533L, 9532L,9531L, 9531L, 9530L, 9529L, 9529L, 9528L, 9527L, 9527L, 9527L,9526L, 9526L, 9526L, 9526L, 9525L, 9524L, 9524L, 9524L, 9524L,9524L, 9537L, 9537L, 9537L, 9536L, 9535L, 9535L, 9535L, 9534L,9534L, 9533L, 9533L, 9533L, 9533L, 9532L, 9531L, 9531L, 9530L,9529L, 9529L, 9528L, 9527L, 9527L, 9527L, 9526L, 9526L, 9526L,9526L, 9534L, 9534L, 9533L, 9533L, 9533L, 9533L, 9532L, 9531L,9531L, 9530L, 9529L, 9529L, 9528L, 9527L, 9527L, 9527L, 9526L,9526L, 9526L, 9526L, 9525L, 9524L, 9524L, 9524L, 9524L, 9524L,9523L, 9532L, 9531L, 9531L, 9530L, 9529L, 9529L, 9528L, 9527L,9527L, 9527L, 9526L, 9526L, 9526L, 9526L, 9525L, 9524L, 9524L,9524L, 9524L, 9524L, 9523L, 9522L, 9521L, 9521L, 9521L, 9521L,9538L, 9538L, 9537L, 9537L, 9537L, 9536L, 9535L, 9535L, 9535L,9534L, 9534L, 9533L, 9533L, 9533L, 9533L, 9532L, 9531L, 9531L,9530L, 9529L, 9529L, 9528L, 9527L, 9527L, 9527L, 9536L, 9535L,9535L, 9535L, 9534L, 9534L, 9533L, 9533L, 9533L, 9533L, 9532L,9531L, 9531L, 9530L, 9529L, 9529L, 9528L, 9527L, 9527L, 9527L,9526L, 9526L, 9526L, 9526L, 9525L, 9533L, 9533L, 9533L, 9533L,9532L, 9531L, 9531L, 9530L, 9529L, 9529L, 9528L, 9527L, 9527L,9527L, 9526L, 9526L, 9526L, 9526L, 9525L, 9524L, 9524L, 9524L,9524L, 9524L, 9523L, 9522L, 9531L, 9531L, 9530L, 9529L, 9529L,9528L, 9527L, 9527L, 9527L, 9526L, 9526L, 9526L, 9526L, 9525L,9524L, 9524L, 9524L, 9524L, 9524L, 9523L, 9522L, 9521L, 9521L,9521L, 9521L, 9520L, 9539L, 9539L, 9538L, 9538L, 9537L, 9537L,9537L, 9536L, 9535L, 9535L, 9535L, 9534L, 9534L, 9533L, 9533L,9533L, 9533L, 9532L, 9531L, 9531L, 9530L, 9529L, 9529L, 9528L,9530L, 9529L, 9529L, 9528L, 9527L, 9527L, 9527L, 9526L, 9526L,9526L, 9526L, 9525L, 9524L, 9524L, 9524L, 9524L, 9524L, 9523L,9522L, 9521L, 9521L, 9521L, 9521L, 9520L, 9529L, 9529L, 9528L,9527L, 9527L, 9527L, 9526L, 9526L, 9526L, 9526L, 9525L, 9524L,9524L, 9524L, 9524L, 9524L, 9523L, 9522L, 9521L, 9521L, 9521L,9521L, 9520L, 9518L, 9528L, 9527L, 9527L, 9527L, 9526L, 9526L,9526L, 9526L, 9525L, 9524L, 9524L, 9524L, 9524L, 9524L, 9523L,9522L, 9521L, 9521L, 9521L, 9521L, 9520L, 9518L, 9527L, 9527L,9527L, 9526L, 9526L, 9526L, 9526L, 9525L, 9524L, 9524L, 9524L,9524L, 9524L, 9523L, 9522L, 9521L, 9521L, 9521L, 9521L, 9520L,9518L, 9526L, 9526L, 9526L, 9526L, 9525L, 9524L, 9524L, 9524L,9524L, 9524L, 9523L, 9522L, 9521L, 9521L, 9521L, 9521L, 9520L,9518L, 9525L, 9524L, 9524L, 9524L, 9524L, 9524L, 9523L, 9522L,9521L, 9521L, 9521L, 9521L, 9520L, 9518L, 9524L, 9524L, 9524L,9524L, 9524L, 9523L, 9522L, 9521L, 9521L, 9521L, 9521L, 9520L,9518L, 9523L, 9522L, 9521L, 9521L, 9521L, 9521L, 9520L, 9518L,9522L, 9521L, 9521L, 9521L, 9521L, 9520L, 9518L, 9521L, 9521L,9521L, 9521L, 9520L, 9518L, 9520L, 9518L, 9518L), Count = c(3L,3L, 3L, 2L, 2L, 4L, 4L, 4L, 4L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 3L,3L, 3L, 4L, 4L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 1L,3L, 3L, 3L, 2L, 2L, 4L, 4L, 4L, 4L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,3L, 3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 4L, 4L, 4L, 4L, 1L, 2L, 2L,1L, 2L, 2L, 1L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 5L, 5L, 5L, 5L,5L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 3L,3L, 1L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 4L, 4L, 1L, 2L, 2L, 1L, 2L,2L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 4L, 4L, 1L,2L, 2L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 4L, 4L,4L, 4L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 3L,3L, 4L, 4L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 4L, 4L, 4L,4L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 4L,4L, 4L, 4L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 3L, 3L,3L, 4L, 4L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 4L, 4L, 4L,4L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 5L, 5L, 5L,5L, 5L, 1L, 1L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 3L, 3L, 3L, 4L, 4L,4L, 4L, 1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 4L, 4L, 4L, 4L, 1L, 1L,3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 4L,4L, 4L, 4L, 1L, 1L, 4L, 4L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 5L, 1L,1L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 4L,4L, 4L, 4L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 4L, 4L, 4L, 4L,1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 1L,1L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L), Count_sum = c(29L, 29L,29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L,29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L,29L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L,27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L,27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L,27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L,27L, 27L, 27L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L,26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L,26L, 26L, 26L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L,25L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L,24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L,24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L,24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L,24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L,24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 22L, 22L, 22L, 22L, 22L,22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,22L, 22L, 22L, 22L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 18L,18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,18L, 18L, 18L, 18L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,14L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,13L, 13L, 13L, 13L, 13L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 7L,7L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 2L, 2L, 1L), Episode_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 21L)), .Names = c("ID", "Day", "Count", "Count_sum","Episode_ID"), row.names = c(NA, -395L), class = "data.frame") 这篇关于根据按特定日期长度的情节分组的行将新列添加到data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云! 08-30 12:06