本文介绍了R-有条件的滞后-如何滞后一定数量的细胞直到满足条件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
尝试解决这个问题已有数周,但似乎无法解决。
Been trying to solve this for weeks, but can't seem to get it.
我有以下数据框:
post_id user_id
1 post-1 user1
2 post-2 user2
3 comment-1 user1
4 comment-2 user3
5 comment-3 user4
6 post-3 user2
7 comment-4 user2
创建一个新变量 parent_id 。因此对于每个观察,它都应执行以下步骤:
And want to create a new variable parent_id. So that for each observation it should perform the following steps:
- 检查是否
post_id
是post
或comment
- 如果
post_id
是post
,然后parent_id
应该等于最早的post_id
整个数据框。 - 如果
post_id
是第一篇文章,则parent_id
应等于NA
- 如果
post_id
是条评论
然后parent_id
应该等于遇到的第一个post_id
。
- Check if
post_id
is eitherpost
orcomment
- If
post_id
ispost
thenparent_id
should equal the earliestpost_id
of the whole data frame. - If
post_id
is the first post thenparent_id
should equalNA
- If
post_id
iscomment
thenparent_id
should equal to the firstpost_id
it encounters.
输出应类似于:
post_id user_id parent_id_man
1 post-1 user1 NA
2 post-2 user2 post-1
3 comment-1 user1 post-2
4 comment-2 user3 post-2
5 comment-3 user4 post-2
6 post-3 user2 post-1
7 comment-4 user2 post-3
我尝试了以下操作:
#Prepare data
df <- df %>% separate(post_id, into=c("type","number"), sep="-", remove=FALSE)
df$number <- as.numeric(df$number)
df <- df %>% mutate(comment_number = ifelse(type == "comment",number,99999))
df <- df %>% mutate(post_number = ifelse(type == "post",number,99999))
#Create parent_id column
df <- df %>% mutate(parent_id = ifelse(type == "post",paste("post-",min(post_number), sep=""),0))
df <- df %>% mutate(parent_id = ifelse(parent_id == post_id,"NA",parent_id))
df <- df %>% select(-comment_number, -post_number)
使用该代码,我可以执行步骤1、2和3 ,但第4步超出了我的范围。我觉得某种类型的条件滞后应该可以解决,但无法提出解决方法。
With that code I am able to perform Steps 1, 2 and 3, but step 4 is beyond me. I get the feeling that a certain type of conditional lagging based should be able to solve it, but can't come up with how to do it.
任何想法都将不胜感激!
Any ideas would be very much appreciated!
推荐答案
以您的解决方案为基础,
Building on your solution,
x <- which(df$type == 'post')
z <- which(df$type == 'comment')
df$parent_id[df$parent_id == 0] <- df$post_id[x[sapply(z, function(i) findInterval(i, x))]]
df$parent_id
#[1] "NA" "post-1" "post-2" "post-2" "post-2" "post-1" "post-3"
这篇关于R-有条件的滞后-如何滞后一定数量的细胞直到满足条件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!