问题描述
最近我偶然发现了 dplyr
的一个奇怪行为,如果有人能提供一些见解,我会很高兴.
Recently I stumbled uppon a strange behaviour of dplyr
and I would be happy if somebody would provide some insights.
假设我有一个数据,其中 com 列包含一些数值.在一个简单的场景中,我想计算 rowSums
.虽然有很多方法可以做到,但这里有两个例子:
Assuming I have a data of which com columns contain some numerical values. In an easy scenario I would like to compute rowSums
. Although there are many ways to do it, here are two examples:
df <- data.frame(matrix(rnorm(20), 10, 2),
ids = paste("i", 1:20, sep = ""),
stringsAsFactors = FALSE)
# works
dplyr::select(df, - ids) %>% {rowSums(.)}
# does not work
# Error: invalid argument to unary operator
df %>%
dplyr::mutate(blubb = dplyr::select(df, - ids) %>% {rowSums(.)})
# does not work
# Error: invalid argument to unary operator
df %>%
dplyr::mutate(blubb = dplyr::select(., - ids) %>% {rowSums(.)})
# workaround:
tmp <- dplyr::select(df, - ids) %>% {rowSums(.)}
df %>%
dplyr::mutate(blubb = tmp)
# works
rowSums(dplyr::select(df, - ids))
# does not work
# Error: invalid argument to unary operator
df %>%
dplyr::mutate(blubb = rowSums(dplyr::select(df, - ids)))
# workaround
tmp <- rowSums(dplyr::select(df, - ids))
df %>%
dplyr::mutate(blubb = tmp)
首先,我真的不明白是什么导致了错误,其次我想知道如何以整洁的方式实际实现一些(可行的)列的整洁计算.
First, I don't really understand what is causing the error and second I would like to know how to actually achieve a tidy computation of some (viable) columns in a tidy way.
编辑
问题 mutate 和 rowSums 排除列,虽然相关,但侧重于使用 rowSums
用于计算.在这里,我很想知道为什么上面的例子不起作用.与其说是如何解决(请参阅变通办法),不如说是了解应用天真的方法时会发生什么.
The question mutate and rowSums exclude columns , although related, focuses on using rowSums
for computation. Here I'm eager to understand why the upper examples do not work. It is not so much about how to solve (see the workarounds) but to understand what happens when the naive approach is applied.
推荐答案
这些示例不起作用,因为您在 mutate
中嵌套了 select
并使用了裸变量名.在这种情况下,select
正在尝试做类似
The examples do not work because you are nesting select
in mutate
and using bare variable names. In this case, select
is trying to do something like
> -df$ids
Error in -df$ids : invalid argument to unary operator
失败是因为您无法否定字符串(即 -"i1"
或 -"i2"
没有意义).以下任一公式均有效:
which fails because you can't negate a character string (i.e. -"i1"
or -"i2"
makes no sense). Either of the formulations below works:
df %>% mutate(blubb = rowSums(select_(., "X1", "X2")))
df %>% mutate(blubb = rowSums(select(., -3)))
或
df %>% mutate(blubb = rowSums(select_(., "-ids")))
按照@Haboryme 的建议.
as suggested by @Haboryme.
这篇关于使用 `rowSums` 改变 `dplyr` 中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!