问题描述
当使用总结
与 plyr
的 ddply
功能,空类别默认删除。您可以通过添加 .drop = FALSE
来更改此行为。但是,当使用总结
与 dplyr
时,这不起作用。有没有另一种方法可以在结果中保留空类?
这是一个假数据的例子。
library(dplyr)
df = data.frame(a = rep(1:3,4),b = rep(1:2,6))
#现在在df $ a
df $ b = factor(df $ b,levels = 1:3)中没有相应值的df $ b添加一个额外的级别
#总结与plyr,保持类别为零
plyr :: ddply(df,b,总结,count_a =长度(a),.drop = FALSE)
b count_a
1 1 6
2 2 6
3 3 0
#现在尝试使用dplyr
df%。%
group_by(b)%。%
总结(count_a =长度(a),.drop = FALSE)
b count_a .drop
1 1 6 FALSE
2 2 6 FALSE
不完全是我所希望的。是否有一个 dplyr
方法来实现与 .drop = FALSE
在 plyr
?
问题仍然是开放的,但在此期间,特别是因为您的数据已经被考虑,您可以使用tidyr中的完成
来获取您可能正在寻找的内容:
library(tidyr)
df%>%
group_by(b)%>%
总汇(count_a =长度(a))%>%
完成(b)
#来源:本地数据框架[3 x 2]
#
#b count_a
#(fctr)(int)
#1 1 6
#2 2 6
#3 3 NA
如果你想要替换值为零,则需要使用填写
指定:
df%>%
pre>
group_by(b)%>%
summaryize(count_a = length(a))%>%
complete(b,fill = list(count_a = ))
#来源:本地数据框架[3 x 2]
#
# b count_a
#(fctr)(dbl)
#1 1 6
#2 2 6
#3 3 0
When using
summarise
withplyr
'sddply
function, empty categories are dropped by default. You can change this behavior by adding.drop = FALSE
. However, this doesn't work when usingsummarise
withdplyr
. Is there another way to keep empty categories in the result?Here's an example with fake data.
library(dplyr) df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3) # Summarise with plyr, keeping categories with a count of zero plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE) b count_a 1 1 6 2 2 6 3 3 0 # Now try it with dplyr df %.% group_by(b) %.% summarise(count_a=length(a), .drop=FALSE) b count_a .drop 1 1 6 FALSE 2 2 6 FALSE
Not exactly what I was hoping for. Is there a
dplyr
method for achieving the same result as.drop=FALSE
inplyr
?解决方案The issue is still open, but in the meantime, especially since your data are already factored, you can use
complete
from "tidyr" to get what you might be looking for:library(tidyr) df %>% group_by(b) %>% summarise(count_a=length(a)) %>% complete(b) # Source: local data frame [3 x 2] # # b count_a # (fctr) (int) # 1 1 6 # 2 2 6 # 3 3 NA
If you wanted the replacement value to be zero, you need to specify that with
fill
:df %>% group_by(b) %>% summarise(count_a=length(a)) %>% complete(b, fill = list(count_a = 0)) # Source: local data frame [3 x 2] # # b count_a # (fctr) (dbl) # 1 1 6 # 2 2 6 # 3 3 0
这篇关于dplyr总结:等效于“.drop = FALSE”保持输出零长度的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!