问题描述
我有一个带有两个变量、名称和日期的 df.我想创建一个新列 (new_dates),它采用属于每个人的第一个日期(每个人在此列中应该只有一个重复日期),并在行下降时为每个日期添加 30 天.
I have a df with two variables, names and dates. I would like to create a new column (new_dates) which takes the first date belonging to each person (each person should have just one repeated date in this column) and add 30 days to each date as the rows descend.
所需的输出如下.所以每个人的 row1 保存原始日期,row2 保存 row1+30,row3 保存 row2+30,依此类推.
Desired output is below. So row1 for each person holds the original date, row2 holds row1+30, row3 holds row2+30 and so on.
dff
names dates new_dates
1 john 2010-06-01 2010-06-01
2 john 2010-06-01 2010-07-01
3 john 2010-06-01 2010-07-31
4 john 2010-06-01 2010-08-30
5 mary 2010-07-09 2010-07-09
6 mary 2010-07-09 2010-08-08
7 mary 2010-07-09 2010-09-07
8 mary 2010-07-09 2010-10-07
9 tom 2010-06-01 2010-06-01
10 tom 2010-06-01 2010-07-01
11 tom 2010-06-01 2010-07-31
12 tom 2010-06-01 2010-08-30
我想我可以为此使用变换.这是我的尝试 - 但对我来说并不完全有效.
I thought I could use transform for this. Here is my attempt at it - but it doesn't quite work for me.
dt <- transform(df, new_date = c(dates[2]+30, NA))
推荐答案
data.table
使这变得容易.一旦转换为数据表,它基本上就是一个命令.您的版本遇到的主要问题是您需要先按名称拆分数据,这样您才能获得每个人的最短日期,然后为每个日期添加适当的 30 天倍数.
data.table
makes this easy. Once you convert to a data table, it's basically one command. The main problem you're having with your version is that you need to split the data by name first, so you can get the minimum date for each person, and then add the appropriate mutiple of 30 days to each date.
library(data.table)
df$dates <- as.Date(df$dates)
dt <- as.data.table(df)
dt[,
list(dates, new_dates=min(dates) + 0:(length(dates) - 1L) * 30),
by=names
]
# names dates new_dates
# 1: john 2010-06-01 2010-06-01
# 2: john 2010-06-01 2010-07-01
# 3: john 2010-06-01 2010-07-31
# 4: john 2010-06-01 2010-08-30
# 5: mary 2010-07-09 2010-07-09
# 6: mary 2010-07-09 2010-08-08
# 7: mary 2010-07-09 2010-09-07
# 8: mary 2010-07-09 2010-10-07
# 9: tom 2010-06-01 2010-06-01
# 10: tom 2010-06-01 2010-07-01
# 11: tom 2010-06-01 2010-07-31
# 12: tom 2010-06-01 2010-08-30
这是一个版本,希望能说明为什么你的不起作用.我仍然更喜欢 data.table
,但希望因为这基本上与您所做的非常接近,所以它可以明确您需要更改的内容:
here is a version that hopefully shows why yours didn't work. I still prefer data.table
, but hopefully since this is basically very close to what you were doing it makes it clear what you need to change:
re_date <- function(df) {
transform(
df[order(df$dates), ],
new_dates=min(dates) + 30 * 0:(length(dates) - 1L)
) }
do.call(rbind, lapply(split(df, df$name), re_date))
从底线 (do.call...
) 开始,split
调用创建一个包含三个数据框的列表,一个是 John 的值,一个是给玛丽的,还有给汤姆的.lapply
然后通过 re_date
函数运行每个数据帧,该函数添加 new_dates
列,最后是 do.call
/rbind
将其重新拼接成一个数据帧.
Starting with the bottom line (do.call...
), the split
call makes a list with three data frames, one with the values for John, one for those for Mary, and one for those for Tom. The lapply
then runs each of those data frames through the re_date
function, which adds the new_dates
column, and finally, the do.call
/rbind
stitches it back together into one data frame.
这篇关于创建新列,在每行 R 上添加 30 天的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!