问题描述
在使用 plyr
时,我经常发现将 adply
用于必须应用于每一行的标量函数.
When working with plyr
I often found it useful to use adply
for scalar functions that I have to apply to each and every row.
例如
data(iris)
library(plyr)
head(
adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length))
)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len
1 5.1 3.5 1.4 0.2 setosa 5.1
2 4.9 3.0 1.4 0.2 setosa 4.9
3 4.7 3.2 1.3 0.2 setosa 4.7
4 4.6 3.1 1.5 0.2 setosa 4.6
5 5.0 3.6 1.4 0.2 setosa 5.0
6 5.4 3.9 1.7 0.4 setosa 5.4
现在我更多地使用 dplyr
,我想知道是否有一种整洁/自然的方法来做到这一点?因为这不是我想要的:
Now I'm using dplyr
more, I'm wondering if there is a tidy/natural way to do this? As this is NOT what I want:
library(dplyr)
head(
mutate(iris, Max.Len= max(Sepal.Length,Petal.Length))
)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len
1 5.1 3.5 1.4 0.2 setosa 7.9
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.9
5 5.0 3.6 1.4 0.2 setosa 7.9
6 5.4 3.9 1.7 0.4 setosa 7.9
推荐答案
从 dplyr 0.2(我认为)rowwise()
开始实现,所以这个问题的答案变成了:
As of dplyr 0.2 (I think) rowwise()
is implemented, so the answer to this problem becomes:
iris %>%
rowwise() %>%
mutate(Max.Len= max(Sepal.Length,Petal.Length))
非rowwise
替代
五年 (!) 之后,这个答案仍然获得了大量流量.自从给出它以来,越来越不推荐 rowwise
,尽管很多人似乎觉得它很直观.帮自己一个忙,通过 Jenny Bryan 的 R 中面向行的工作流与 tidyverse 材料以很好地处理这个主题.
Non rowwise
alternative
Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise
is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.
我发现的最直接的方法是基于 Hadley 使用 pmap
的示例之一:
The most straightforward way I have found is based on one of Hadley's examples using pmap
:
iris %>%
mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))
使用这种方法,您可以为 pmap
中的函数 (.f
) 提供任意数量的参数.
Using this approach, you can give an arbitrary number of arguments to the function (.f
) inside pmap
.
pmap
是一个很好的概念方法,因为它反映了这样一个事实,即当您进行行明智的操作时,您实际上是在处理来自向量列表(数据帧中的列)的元组.
pmap
is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).
这篇关于使用 dplyr 将函数应用于表的每一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!