本文介绍了滞后没有看到mutate对上一行的影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎偶然发现了我无法解释的 mutate / lag / ifelse 行为。我有以下(简化的)数据框:

  test<-data.frame(type = c( START, END, START, START, START, START, END),
个字符串AsFactors = FALSE)

>测试

类型
1 START
2 END
3 START
4 START
5 START
6 START
7 START
8 END

我想修改列类型,以便具有交替的 START 和 END 对的序列(请注意, test 数据框只能使用 START 的序列, END 永远不会重复):

 >所需的

类型
1 START
2 END
3 START
4 END
5 START
6 END
7 START
8 END

我想我可以用以下代码实现目标:

  test%>%
mutate(type = ifelse(type == START&
dplyr :: lag(type,n = 1,default = END)== START&
dplyr :: lead(type,n = 1,default = END)== START , END,类型))

代码应检测到所在的行START 之前是 START ,然后是 START ,在这种情况下, type 的值更改为 END 。进行此更改之后,以下 START ( test 的第5行)不应该匹配,因为它以前的 type 的值现在为 END 。不幸的是,该命令的输出如下:

 类型
1 START
2 END
3 START
4 END
5 END
6 END
7 START
8 END

就像 lag 看到的值不受变异影响。这是应该如何工作的吗?有没有一种方法可以使 lag 看到 mutate 在上一行中的效果?

版本:R版本3.2.3(2015-12-10),dplyr_0.4.3



更新:原因下面的Paul Rougieux解释了为什么上面的代码不起作用的原因:超前和滞后是固定的,并且不考虑进一步的修改。因此,我猜正确的答案是使用dplyr无法直接完成。

解决方案

在 mutate()中分别定义滞后和前导变量您对 ifelse(type == START& lag == START& Lead == START, END,键入)的呼叫是无法正常工作:

  test<-data.frame(type = c( START, END, START, START, START, START, END),stringsAsFactors = FALSE)
测试%&%;%
mutate(lag = dplyr :: lag(type,n = 1,default = END),
lead = dplyr :: lead(type,n = 1,default = END),
type2 = ifelse(type == START& lag == START& Lead == START,
END,类型))

#类型滞后线索type2
#1 START END END START
#2结束开始开始结束
#3开始结束开始开始
#4开始开始开始结束
#5开始开始开始结束
#6开始开始结束开始
#7 END START END END

dplyr :: mutate( )整体修改向量。超前和滞后是固定的,不考虑对 type 向量的进一步修改。在这种情况下,您需要一个`Reduce()̀函数。检查帮助(减少)。


I seem to have stumbled upon a mutate/lag/ifelse behaviour that I cannot explain. I have the following (simplified) dataframe:

test <- data.frame(type = c("START", "END", "START", "START", "START", "START", "END"),
                   stringsAsFactors = FALSE)

> test

  type
1 START
2   END
3 START
4 START
5 START
6 START
7 START
8   END

I would like to modify the column type in order to have a sequence of alternating START and END pairs (note that in the test dataframe only sequences of START are possible, END is never repeated):

> desired

  type
1 START
2   END
3 START
4   END
5 START
6   END
7 START
8   END

I thought I could achieve my goal with the following code:

test %>%
 mutate(type = ifelse( type == "START" &
                       dplyr::lag(type, n=1, default="END") == "START" &
                       dplyr::lead(type, n=1, default="END") == "START", "END" , type))

The code should detect rows in which START is preceded by a START and followed by a START, in which case the type value is changed to END. After this change, the following START (row number 5 of test) should not be matched, since its previous type value is now END. Unfortunately, the output of the command is the following:

   type
1 START
2   END
3 START
4   END
5   END
6   END
7 START
8   END 

It's like the value seen by lag is not affected by mutate. Is this how it is supposed to work? Is there a way to code it in a way that lag sees the effects of mutate on the previous row?

Versions: R version 3.2.3 (2015-12-10), dplyr_0.4.3

UPDATE: The reason why the above code doesn't work is explained by Paul Rougieux below: lead and lag are fixed and do not take into account further modification. So I guess the correct answer is "it cannot be done straightforwardly using dplyr".

解决方案

Defining lag and lead variables separately in mutate() will show you that your call to ifelse(type == "START" & lag == "START" & lead == "START", "END" , type) is not going to work:

test <- data.frame(type = c("START", "END", "START", "START", "START", "START", "END"), stringsAsFactors = FALSE)
test %>%
    mutate(lag = dplyr::lag(type, n=1, default="END"),
           lead = dplyr::lead(type, n=1, default="END"),
           type2 = ifelse(type == "START" & lag == "START" & lead == "START",
                          "END" , type))

#   type   lag  lead type2
#1 START   END   END START
#2   END START START   END
#3 START   END START START
#4 START START START   END
#5 START START START   END
#6 START START   END START
#7   END START   END   END

dplyr::mutate() modifies the vector as a whole. Lead and lag are fixed and do not take into account further modification to the type vector. What you want is a `Reduce()̀ function in this case. Check help(Reduce).

这篇关于滞后没有看到mutate对上一行的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 03:14