问题描述
这个数据框包含我称之为数据"的内容:
图书馆(tidyverse)df_d <- data_frame(key = c("cat", "cat", "dog", "dog"),value_1 = c(1,2,3,4),value_2 = c(2,4,6,8))
这是一个数据框,我打算将其用作函数查找表之类的东西.f
是单变量函数,f2
是多变量函数:
df_f
我可以轻松制作一个数据帧,以便任何
cat
行获得 cat
函数,而任何 dog
行获得 dog
函数:
df_both
我能够弄清楚如何将每个
f
函数应用于例如 value_1
列以获取:
df_both %>% mutate(result = invoke_map_dbl(f, value_1))#># 小块:4 x 6#>key value_1 value_2 f f2 结果#><chr><dbl><dbl><dbl>#>1 猫 1.00 2.00 <fn><fn>1.00#>2 猫 2.00 4.00 <fn><fn>4.00#>3 狗 3.00 6.00 <fn><fn>1.73#>4 狗 4.00 8.00 <fn><fn>2.00
我的问题是:我怎样才能创建一个列
result2
,它接受 f2
中的每个函数并用作它的输入 c(value_1, value_2).如果将
f2
中的函数重新定义为两个变量的显式函数会使事情变得容易得多,那也很好.
所需的输出:
#># 小费:4 x 7#>key value_1 value_2 f f2 result result2#><chr><dbl><dbl><dbl><dbl>#>1 猫 1.00 2.00 <fn><fn>1.00 9.00#>2 猫 2.00 4.00 <fn><fn>4.00 36.0#>3 狗 3.00 6.00 <fn><fn>1.73 3.00#>4 狗 4.00 8.00 <fn><fn>2.00 3.46
(这个问题是由今天早些时候一个不幸的自我删除问题引发的.)
解决方案
如果将 f2 中的函数重新定义为两个变量的显式函数会使事情变得更容易,那也很好."
是的,我认为这是一种更自然的情况.否则数据将按行存储,并且可能需要重新整形.
重新定义你的功能:
df_f
现在您再次使用
map_invoke
,将 .x
作为列表传递,尽管您需要使用 transpose
将列表翻转过来:
变异(df_both,结果 = invoke_map_dbl(f, value_1),result2 = invoke_map_dbl(f2, transpose(list(value_1, value_2))))
# tibble: 4 x 7key value_1 value_2 f f2 result result2<chr><dbl><dbl><dbl><dbl>1 猫 1. 2. <fn><fn>1.00 9.002 猫 2. 4. <fn><fn>4.00 36.03 狗 3. 6. <fn><fn>1.73 3.004 狗 4. 8. <fn><fn>2.00 3.46
一组三个参数函数将简单地扩展到
invoke_map_dbl(f3, transpose(list(value_1, value_2, value_3))
请注意,这种方法不适用于大型数据集,因为您没有使用矢量化.
一个更具可扩展性的替代方案可能涉及嵌套,您至少在每个组中应用每个函数一次:
df_both %>%group_by(key) %>%嵌套()%>%变异(数据=地图(数据,~mutate(., result = first(f)(value_1), result2 = first(f2)(value_1, value_2)))) %>%取消嵌套()
得到相同的结果.
This dataframe contains what I'll call the "data":
library(tidyverse)
df_d <- data_frame(key = c("cat", "cat", "dog", "dog"),
value_1 = c(1,2,3,4),
value_2 = c(2,4,6,8))
Here is a dataframe that I intend to use as something like a function look-up table.
f
is a single variable function and f2
is a multivariable function:
df_f <- data_frame(key = c("cat", "dog"),
f = c(function(x) x^2, function(x) sqrt(x)),
f2 = c(function(x) (x[1]+x[2])^2, function(x) sqrt(x[1]+x[2])))
I can easily make a dataframe so that any
cat
row gets the cat
functions and any dog
row gets the dog
functions:
df_both <- left_join(df_d, df_f)
I was able to figure out how to apply each of the
f
functions to, say, the value_1
column to get:
df_both %>% mutate(result = invoke_map_dbl(f, value_1))
#> # A tibble: 4 x 6
#> key value_1 value_2 f f2 result
#> <chr> <dbl> <dbl> <list> <list> <dbl>
#> 1 cat 1.00 2.00 <fn> <fn> 1.00
#> 2 cat 2.00 4.00 <fn> <fn> 4.00
#> 3 dog 3.00 6.00 <fn> <fn> 1.73
#> 4 dog 4.00 8.00 <fn> <fn> 2.00
My question is: how can I create a columns
result2
that takes each function in f2
and uses as its input c(value_1, value_2)
. If re-defining the functions in f2
to be explicitly functions of two variables makes things much easier, that's fine too.
Desired output:
#> # A tibble: 4 x 7
#> key value_1 value_2 f f2 result result2
#> <chr> <dbl> <dbl> <list> <list> <dbl> <dbl>
#> 1 cat 1.00 2.00 <fn> <fn> 1.00 9.00
#> 2 cat 2.00 4.00 <fn> <fn> 4.00 36.0
#> 3 dog 3.00 6.00 <fn> <fn> 1.73 3.00
#> 4 dog 4.00 8.00 <fn> <fn> 2.00 3.46
(Question motivated by an unfortunately self-deleted question from earlier today.)
解决方案
"If re-defining the functions in f2 to be explicitly functions of two variables makes things much easier, that's fine too."
Yes, that would be a more natural situation here, I think. Otherwise data is stored rowwise, and should possibly be reshaped.
Redefining your functions:
df_f <- data_frame(key = c("cat", "dog"),
f = c(function(x) x^2, function(x) sqrt(x)),
f2 = c(function(x, y) (x + y)^2, function(x, y) sqrt(x + y)))
df_both <- left_join(df_d, df_f)
Now you again use
map_invoke
, passing .x
as a list, although you need to turn the lists inside out using transpose
:
mutate(
df_both,
result = invoke_map_dbl(f, value_1),
result2 = invoke_map_dbl(f2, transpose(list(value_1, value_2)))
)
A set of three argument functions would then simply extend to
invoke_map_dbl(f3, transpose(list(value_1, value_2, value_3))
Note that this kind of approach will not work well on large datasets, since you aren't using vectorization.
A more scalable alternative may involve nesting, where you at least apply each function once within each group:
df_both %>%
group_by(key) %>%
nest() %>%
mutate(data = map(
data,
~mutate(., result = first(f)(value_1), result2 = first(f2)(value_1, value_2))
)) %>%
unnest()
Which gives the same result.
这篇关于将具有多变量函数列表变量的数据框应用于具有函数参数的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!