问题描述
我正在尝试使用 dplyr
和 tidyr
对某些格式不正确的数据进行最后观察 。它不能按预期工作。
库(dplyr)
/ pre>
库(tidyr)
df< - data.frame(id = c(1,1,2,2,3,3),
email = c('[email protected]',NA,'joe @ email.com',NA,NA,NA))
df2< - df%>%group_by(id)%>%fill(email)
这导致:
来源:本地数据框[6 x 2]
组:id [3]
id电子邮件
(dbl)(fctr)
1 1 [email protected]
2 1 [email protected]
3 2 [email protected]
4 2 [email protected]
5 3 [email protected]
6 3 joe @电子邮件。 com
我希望它是:
<$来源:本地数据框[6 x 2]
组:id [3]
id email
(dbl)(fctr)
1 1 [email protected]
2 1 [email protected]
3 2 [email protected]
4 2 [email protected]
5 3 NA
6 3 NA
我期望它是后者的原因是因为 group_by
的文档说, group_by
函数接受现有的tbl,并将其转换为分组tbl,其中按组进行操作。在这种情况下,该组由 id
变量确定,以下操作是 fill(email)
。然而,这显然不是这样做的。
在任何人问之前,如果字段都是 character
而不是 numeric
或因素
。
更新
@aosmith指出的hadley / tidyr / issues / 129rel =nofollow我要说,在问题解决之前,不会有正确的解决方案。一切都只是一个解决方法。所以,如果有人成功的公关解决这个问题,并发布在这里,我很乐意把它作为解决方案。
幸运的是,您仍然可以使用 zoo :: na.locf
:
df%>%
group_by(id)%>%
mutate(email = zoo :: na.locf(email,na.rm = FALSE))
#来源:本地数据框架[6 x 2]
#组:id [3]
#
#id电子邮件
#(dbl)(fctr)
# 1 [email protected]
#2 1 [email protected]
#3 2 [email protected]
#4 2 [email protected]
#5 3 NA
#6 3 NA
I'm trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr
and tidyr
. It isn't working as I'd expect.
library(dplyr)
library(tidyr)
df <- data.frame(id=c(1,1,2,2,3,3),
email=c('[email protected]', NA, '[email protected]', NA, NA, NA))
df2 <- df %>% group_by(id) %>% fill(email)
This results in:
Source: local data frame [6 x 2]
Groups: id [3]
id email
(dbl) (fctr)
1 1 [email protected]
2 1 [email protected]
3 2 [email protected]
4 2 [email protected]
5 3 [email protected]
6 3 [email protected]
I expect it to be:
Source: local data frame [6 x 2]
Groups: id [3]
id email
(dbl) (fctr)
1 1 [email protected]
2 1 [email protected]
3 2 [email protected]
4 2 [email protected]
5 3 NA
6 3 NA
The reason I expect it to be the latter is because of group_by
's documentation saying, "The group_by
function takes an existing tbl and converts it into a grouped tbl where operations are performed "by group"." The group in this case is determined by the id
variable, and the following operation is fill(email)
. However, it's pretty clearly NOT doing that.
And before anybody asks, it makes no difference if the fields are both character
instead of numeric
or factor
.
UPDATE@aosmith pointed out this open issue on Github. I'm going to say that there won't be a proper solution to this problem until that issue is resolved. Everything else would just be a workaround. So, if somebody makes a successful PR addressing that issue and posts it here, I'd be happy to mark it as the solution.
Luckily you can still use zoo::na.locf
for this:
df %>%
group_by(id) %>%
mutate(email = zoo::na.locf(email, na.rm = FALSE))
# Source: local data frame [6 x 2]
# Groups: id [3]
#
# id email
# (dbl) (fctr)
# 1 1 [email protected]
# 2 1 [email protected]
# 3 2 [email protected]
# 4 2 [email protected]
# 5 3 NA
# 6 3 NA
这篇关于group_by()into fill()不能按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!