我正在努力提高我的向量化知识,尤其是代替循环,因此我想知道如何对这个循环进行向量化.(我觉得使用 outer()可能有解决方案,但是我对向量化功能的了解仍然非常有限.) 更新 有关 loop.function(), tidyverse.function(), loop的实际数据基准测试, times = 10000 .function2(), datatable.function()和 loop.function.TMS(): 单位:毫秒expr min lq平均中位数uq max neval cld循环功能(日期)186.588600 202.78350 225.724249 215.56575 234.035750 999.8234 10000 etidyverse.function(dat)21.523400 22.93695 26.795815 23.67290 26.862700 295.7456 10000 cloop.function2(dat)119.695400 126.48825 142.568758 135.23555 148.876100 929.0066 10000 ddatatable.function(dat)8.517600 9.28085 10.644163 9.97835 10.766749 215.3245 10000 bloop.function.TMS(dat)4.482001 5.08030 5.916408 5.38215 5.833699 77.1935 10000个 对我来说可能最有趣的结果是 tidyverse.function()在真实数据上的性能.我将不得不稍后再尝试添加 Rccp 解决方案-我无法使它们在实际数据上正常工作.我很感谢这篇文章引起的所有兴趣和回答-我的目的是学习和改进性能,从给出的所有评论和解决方案中当然可以学到很多东西.谢谢!解决方案似乎无法对原始数据帧 df 进行矢量化.但是,如果您使用 reshape2 :: dcast()对其进行转换,则每个商店有一行: require(reshape2)df $ Fruit<-as.character(df $ Fruit)by_shop<-dcast(df,商店〜水果,value.var =订单")#店苹果橙梨#1 A 1 2 3#2 B不适用1 2#3 C 2 NA 1#4 D 2 3 1#5 E 1 1 1 ...,那么您可以轻松地至少对[m,n]的每种组合进行向量化: 水果<-唯一(df $ Fruit)外层(水果,水果,向量化函数(m,n,by_shop)sum(by_shop [,m]> by_shop [,n],na.rm = TRUE),c("m","n")),by_shop)#[,1] [,2] [,3]#[1,] 0 0 2#[2,] 2 0 1#[3,] 1 2 0 这可能是您想要对 outer 进行的解决方案.更快的解决方案是对水果[m,n]的所有组合进行真正的矢量化,但是我一直在考虑它,但是我看不到任何解决方法.因此,我不得不使用 Vectorize 函数,该功能当然比真正的矢量化要慢得多.与原始功能进行基准比较: 单位:毫秒expr min lq平均中位数uq max nevalloop.function(df)3.788794 3.926851 4.157606 4.002502 4.090898 9.529923 100loop.function.TMS(df)1.582858 1.625566 1.804140 1.670095 1.756671 8.569813 100 功能和功能基准代码(还添加了暗号的保留):require(reshape2)loop.function.TMS<-function(df){df $ Fruit<-as.character(df $ Fruit)by_shop<-dcast(df,商店〜水果,value.var =订单")水果<-唯一(df $ Fruit)o <-外(水果,水果,矢量化(函数(m,n,by_shop)sum(by_shop[,m]> by_shop[,n],na.rm = TRUE),c(m","; n)),by_shop)colnames(o)<-行名(o)<-水果Ø}要求(微基准测试)微基准(loop.function(df),loop.function.TMS(df)) I want to speed up a function for creating a pairwise matrix that describes the number of times an object is selected before and after all other objects, within a set of locations.Here is an example df: df <- data.frame(Shop = c("A","A","A","B","B","C","C","D","D","D","E","E","E"), Fruit = c("apple", "orange", "pear", "orange", "pear", "pear", "apple", "pear", "apple", "orange", "pear", "apple", "orange"), Order = c(1, 2, 3, 1, 2, 1, 2, 1, 2, 3, 1, 1, 1))In each Shop, Fruit is picked by a customer in a given Order.The following function creates an m x n pairwise matrix:loop.function <- function(df){ fruits <- unique(df$Fruit) nt <- length(fruits) mat <- array(dim=c(nt,nt)) for(m in 1:nt){ for(n in 1:nt){ ## filter df for each pair of fruit xm <- df[df$Fruit == fruits[m],] xn <- df[df$Fruit == fruits[n],] ## index instances when a pair of fruit are picked in same shop mm <- match(xm$Shop, xn$Shop) ## filter xm and xn based on mm xm <- xm[! is.na(mm),] xn <- xn[mm[! is.na(mm)],] ## assign number of times fruit[m] is picked after fruit[n] to mat[m,n] mat[m,n] <- sum(xn$Order < xm$Order) } } row.names(mat) <- fruits colnames(mat) <- fruits return(mat)}Where mat[m,n] is the number of times fruits[m] is picked after fruits[n]. And mat[n,m] is the number of times fruits[m] is picked before fruits[n]. It is not recorded if pairs of fruit are picked at the same time (e.g. in Shop E).See expected output:>loop.function(df) apple orange pearapple 0 0 2orange 2 0 1pear 1 2 0You can see here that pear is chosen twice before apple (in Shop C and D), and apple is chosen once before pear (in Shop A).I am trying to improve my knowledge of vectorization, especially in place of loops, so I want to know how this loop can be vectorized.(I have a feeling there may be a solution using outer(), but my knowledge of vectorizing functions is still very limited.)UpdateSee benchmarking with real data times = 10000 for loop.function(), tidyverse.function(), loop.function2(), datatable.function() and loop.function.TMS():Unit: milliseconds expr min lq mean median uq max neval cld loop.function(dat) 186.588600 202.78350 225.724249 215.56575 234.035750 999.8234 10000 e tidyverse.function(dat) 21.523400 22.93695 26.795815 23.67290 26.862700 295.7456 10000 c loop.function2(dat) 119.695400 126.48825 142.568758 135.23555 148.876100 929.0066 10000 d datatable.function(dat) 8.517600 9.28085 10.644163 9.97835 10.766749 215.3245 10000 b loop.function.TMS(dat) 4.482001 5.08030 5.916408 5.38215 5.833699 77.1935 10000 aProbably the most interesting result for me is the performance of tidyverse.function() on the real data. I will have to try add Rccp solutions at a later date - I'm having trouble making them work on the real data.I appreciate all the interest and answers given to this post - my intention was to learn and improve performance, and there is certainly a lot to learn from all the comments and solutions given. Thanks! 解决方案 It seems not possible to vectorize over the original data frame df. But if you transform it using reshape2::dcast(), to have one line per each shop:require(reshape2)df$Fruit <- as.character(df$Fruit)by_shop <- dcast(df, Shop ~ Fruit, value.var = "Order")# Shop apple orange pear# 1 A 1 2 3# 2 B NA 1 2# 3 C 2 NA 1# 4 D 2 3 1# 5 E 1 1 1..., then you can easily vectorize at least for each combination of [m, n]:fruits <- unique(df$Fruit)outer(fruits, fruits, Vectorize( function (m, n, by_shop) sum(by_shop[,m] > by_shop[,n], na.rm = TRUE), c("m", "n") ), by_shop)# [,1] [,2] [,3]# [1,] 0 0 2# [2,] 2 0 1# [3,] 1 2 0This is probably the solution you desired to do with outer. Much faster solution would be a true vectorization over all combinations of fruits [m, n], but I've been thinking about it and I don't see any way to do it. So I had to use the Vectorize function which of course is much slower than true vectorization.Benchmark comparison with your original function:Unit: milliseconds expr min lq mean median uq max neval loop.function(df) 3.788794 3.926851 4.157606 4.002502 4.090898 9.529923 100 loop.function.TMS(df) 1.582858 1.625566 1.804140 1.670095 1.756671 8.569813 100Function & benchmark code (also added the preservation of the dimnames):require(reshape2)loop.function.TMS <- function(df) { df$Fruit <- as.character(df$Fruit) by_shop <- dcast(df, Shop ~ Fruit, value.var = "Order") fruits <- unique(df$Fruit) o <- outer(fruits, fruits, Vectorize(function (m, n, by_shop) sum(by_shop[,m] > by_shop[,n], na.rm = TRUE), c("m", "n")), by_shop) colnames(o) <- rownames(o) <- fruits o}require(microbenchmark)microbenchmark(loop.function(df), loop.function.TMS(df)) 这篇关于R:向量化循环以创建成对矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-23 15:34