本文介绍了group_by()into fill()不能按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 dplyr tidyr 对某些格式不正确的数据进行最后观察 。它不能按预期工作。

 库(dplyr)
库(tidyr)

df< - data.frame(id = c(1,1,2,2,3,3),
email = c('[email protected]',NA,'joe @ email.com',NA,NA,NA))
df2< - df%>%group_by(id)%>%fill(email)
/ pre>

这导致:

 来源:本地数据框[6 x 2] 
组:id [3]

id电子邮件
(dbl)(fctr)
1 1 [email protected]
2 1 [email protected]
3 2 [email protected]
4 2 [email protected]
5 3 [email protected]
6 3 joe @电子邮件。 com

我希望它是:



<$来源:本地数据框[6 x 2]
组:id [3]

id email
(dbl)(fctr)
1 1 [email protected]
2 1 [email protected]
3 2 [email protected]
4 2 [email protected]
5 3 NA
6 3 NA

我期望它是后者的原因是因为 group_by 的文档说, group_by 函数接受现有的tbl,并将其转换为分组tbl,其中按组进行操作。在这种情况下,该组由 id 变量确定,以下操作是 fill(email)。然而,这显然不是这样做的。






在任何人问之前,如果字段都是 character 而不是 numeric 因素






更新
@aosmith指出的hadley / tidyr / issues / 129rel =nofollow我要说,在问题解决之前,不会有正确的解决方案。一切都只是一个解决方法。所以,如果有人成功的公关解决这个问题,并发布在这里,我很乐意把它作为解决方案。

幸运的是,您仍然可以使用 zoo :: na.locf

  df%>%
group_by(id)%>%
mutate(email = zoo :: na.locf(email,na.rm = FALSE))
#来源:本地数据框架[6 x 2]
#组:id [3]

#id电子邮件
#(dbl)(fctr)
# 1
[email protected]
#2 1 [email protected]
#3 2 [email protected]
#4 2 [email protected]
#5 3 NA
#6 3 NA


I'm trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr and tidyr. It isn't working as I'd expect.

library(dplyr)
library(tidyr)

df <- data.frame(id=c(1,1,2,2,3,3),
                 email=c('[email protected]', NA, '[email protected]', NA, NA, NA))
df2 <- df %>% group_by(id) %>% fill(email)

This results in:

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 [email protected]
2     1 [email protected]
3     2 [email protected]
4     2 [email protected]
5     3 [email protected]
6     3 [email protected]

I expect it to be:

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 [email protected]
2     1 [email protected]
3     2 [email protected]
4     2 [email protected]
5     3 NA
6     3 NA

The reason I expect it to be the latter is because of group_by's documentation saying, "The group_by function takes an existing tbl and converts it into a grouped tbl where operations are performed "by group"." The group in this case is determined by the id variable, and the following operation is fill(email). However, it's pretty clearly NOT doing that.


And before anybody asks, it makes no difference if the fields are both character instead of numeric or factor.


UPDATE@aosmith pointed out this open issue on Github. I'm going to say that there won't be a proper solution to this problem until that issue is resolved. Everything else would just be a workaround. So, if somebody makes a successful PR addressing that issue and posts it here, I'd be happy to mark it as the solution.

解决方案

Luckily you can still use zoo::na.locf for this:

df %>% 
    group_by(id) %>% 
    mutate(email = zoo::na.locf(email, na.rm = FALSE))  
# Source: local data frame [6 x 2]
# Groups: id [3]
# 
#      id         email
#   (dbl)        (fctr)
# 1     1 [email protected]
# 2     1 [email protected]
# 3     2 [email protected]
# 4     2 [email protected]
# 5     3            NA
# 6     3            NA

这篇关于group_by()into fill()不能按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 07:19