问题描述
我正在尝试使用自定义函数来使用管道 mutate 语句.我看起来有点类似 所以发布但徒劳无功.假设我有一个这样的数据框(其中 blob
是一些与特定任务无关的变量,但它是整个数据的一部分):
df
我有一个使用变量名称的函数,因此根据 exclude
列中的值选择一些,例如计算 exclude
中未指定的变量的总和(始终为单个字符).
FUN
当我给 FUN
单行(第 1 行)时,我得到了 C
和 D
(那些没有提到的)的预期总和by exclude
),即 4:
FUN(df[1,])
如何在带有 mutate 的管道中执行类似操作(将结果添加到变量 s
).这两次尝试都不起作用:
df %>% mutate(s=FUN(.))df %>% group_by(1:n()) %>% mutate(s=FUN(.))
更新这也不能按预期工作:
df %>% rowwise(.) %>% mutate(s=FUN(.))
这是有原因的,但不在 dplyr 的 mutate(和管道)范围内:
df$s
如果你想使用 dplyr
你可以使用 rowwise
和你的函数 FUN
.
df %>%逐行%>%做({结果 = as_data_frame(.)结果$s = FUN(结果)结果})
使用 group_by
而不是 rowwise
(就像您已经尝试过的那样)但使用 do
而不是 mutate
df %>%group_by(1:n())%>%做({结果 = as_data_frame(.)结果$s = FUN(结果)结果})
在这种情况下 mutate
不起作用的原因是你将整个 tibble 传递给它,所以它就像调用 FUN(df)
.>
做同样事情的一种更有效的方法是制作一个要包含的列矩阵,然后使用 rowSums
.
cols
I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain.Say I have a data frame like this (where blob
is some variable not related to the specific task but is part of the entire data) :
df <-
data.frame(exclude=c('B','B','D'),
B=c(1,0,0),
C=c(3,4,9),
D=c(1,1,0),
blob=c('fd', 'fs', 'sa'),
stringsAsFactors = F)
I have a function that uses the variable names so select some based on the value in the exclude
column and e.g. calculates a sum on the variables not specified in exclude
(which is always a single character).
FUN <- function(df){
sum(df[c('B', 'C', 'D')] [!names(df[c('B', 'C', 'D')]) %in% df['exclude']] )
}
When I gives a single row (row 1) to FUN
I get the the expected sum of C
and D
(those not mentioned by exclude
), namely 4:
FUN(df[1,])
How do I do similarly in a pipe with mutate (adding the result to a variable s
). These two tries do not work:
df %>% mutate(s=FUN(.))
df %>% group_by(1:n()) %>% mutate(s=FUN(.))
UPDATEThis also do not work as intended:
df %>% rowwise(.) %>% mutate(s=FUN(.))
This works of cause but is not within dplyr's mutate (and pipes):
df$s <- sapply(1:nrow(df), function(x) FUN(df[x,]))
If you want to use dplyr
you can do so using rowwise
and your function FUN
.
df %>%
rowwise %>%
do({
result = as_data_frame(.)
result$s = FUN(result)
result
})
The same can be achieved using group_by
instead of rowwise
(like you already tried) but with do
instead of mutate
df %>%
group_by(1:n()) %>%
do({
result = as_data_frame(.)
result$s = FUN(result)
result
})
The reason mutate
doesn't work in this case, is that you are passing the whole tibble to it, so it's like calling FUN(df)
.
A much more efficient way of doing the same thing though is to just make a matrix of columns to be included and then use rowSums
.
cols <- c('B', 'C', 'D')
include_mat <- outer(function(x, y) x != y, X = df$exclude, Y = cols)
# or outer(`!=`, X = df$exclude, Y = cols) if it's more readable to you
df$s <- rowSums(df[cols] * include_mat)
这篇关于R:逐行 dplyr::mutate 使用采用数据帧行并返回整数的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!