


Been trying to solve this for weeks, but can't seem to get it.


    post_id user_id
1    post-1   user1
2    post-2   user2
3 comment-1   user1
4 comment-2   user3
5 comment-3   user4
6    post-3   user2
7 comment-4   user2

创建一个新变量 parent_id 。因此对于每个观察,它都应执行以下步骤:

And want to create a new variable parent_id. So that for each observation it should perform the following steps:

  1. 检查是否 post_id post comment

  2. 如果 post_id post ,然后 parent_id 应该等于最早的 post_id 整个数据框。

  3. 如果 post_id 是第一篇文章,则 parent_id 应等于 NA

  4. 如果 post_id 条评论然后 parent_id 应该等于遇到的第一个 post_id

  1. Check if post_id is either post or comment
  2. If post_id is post then parent_id should equal the earliest post_id of the whole data frame.
  3. If post_id is the first post then parent_id should equal NA
  4. If post_id is comment then parent_id should equal to the first post_id it encounters.


    post_id user_id parent_id_man
1    post-1   user1            NA
2    post-2   user2        post-1
3 comment-1   user1        post-2
4 comment-2   user3        post-2
5 comment-3   user4        post-2
6    post-3   user2        post-1
7 comment-4   user2        post-3


#Prepare data
df <- df %>% separate(post_id, into=c("type","number"), sep="-", remove=FALSE)
df$number <- as.numeric(df$number)
df <- df %>% mutate(comment_number = ifelse(type == "comment",number,99999))
df <- df %>% mutate(post_number = ifelse(type == "post",number,99999))

#Create parent_id column
df <- df %>% mutate(parent_id = ifelse(type == "post",paste("post-",min(post_number), sep=""),0))
df <- df %>% mutate(parent_id = ifelse(parent_id == post_id,"NA",parent_id))
df <- df %>% select(-comment_number, -post_number)

使用该代码,我可以执行步骤1、2和3 ,但第4步超出了我的范围。我觉得某种类型的条件滞后应该可以解决,但无法提出解决方法。

With that code I am able to perform Steps 1, 2 and 3, but step 4 is beyond me. I get the feeling that a certain type of conditional lagging based should be able to solve it, but can't come up with how to do it.


Any ideas would be very much appreciated!



Building on your solution,

x <- which(df$type == 'post')
z <- which(df$type == 'comment')
df$parent_id[df$parent_id == 0] <- df$post_id[x[sapply(z, function(i) findInterval(i, x))]]
#[1] "NA"     "post-1" "post-2" "post-2" "post-2" "post-1" "post-3"


09-14 22:48