问题描述
根据我之前所读的内容,矢量化是一种并行化形式,称为SIMD.它允许处理器在数组上同时执行相同的指令(例如加法).
但是,当我阅读关于Julia和R的矢量化性能的矢量化代码和反矢量化代码之间的关系.该帖子声称,经过向量化的Julia代码(通过循环)比在Julia和R中的向量化代码都快,原因是:
它声称R将用R编写的矢量化代码转换为C中的反矢量化代码.如果矢量化更快(作为并行化的一种形式),为什么R会对代码进行反矢量化,为什么加号呢?
R中的向量化"是R解释器视图中的向量处理.以功能cumsum
为例.进入时,R解释器看到将向量x
传递到此函数中.但是,该工作随后传递给R解释器无法分析/跟踪的C语言. C在工作时,R在等待.在R的解释器恢复工作时,已经处理了向量.因此,在R看来,它发出了一条指令,但处理了一个向量.这类似于SIMD的概念-单指令,多数据".
在R中,不仅将cumsum
接受向量并返回向量的函数视为向量化",将sum
之类的函数接受向量并返回标量的方法也称为向量化".
简而言之:每当R为循环调用一些已编译的代码时,它就是向量化".如果您想知道为什么这种向量化"有用,那是因为用编译语言编写的循环比用解释语言编写的循环要快. C循环被翻译成CPU可以理解的机器语言.但是,如果CPU要执行R循环,则需要R的解释器的帮助来逐次读取.这就好比,如果您会说中文(最难的人类语言),则可以更快地回应说中文的人.否则,您需要翻译人员先用英语在句子中翻译中文,然后再用英语回答,然后翻译人员逐句将其翻译回中文.沟通的效率大大降低.
x <- runif(1e+7)
## R loop
system.time({
sumx <- 0
for (x0 in x) sumx <- sumx + x0
sumx
})
# user system elapsed
# 1.388 0.000 1.347
## C loop
system.time(sum(x))
# user system elapsed
# 0.032 0.000 0.030
请注意,R中的矢量化"只是SIMD的一种类比,而不是真实的类比.真正的SIMD使用CPU的向量寄存器进行计算,因此是通过数据并行性进行的真正的并行计算. R不是可以编程CPU寄存器的语言;为此,您必须编写编译后的代码或汇编代码.
R的向量化"并不关心如何真正执行以编译语言编写的循环.毕竟这是R口译员所不具备的.关于这些编译后的代码是否将与SIMD一起执行,请阅读 R在进行矢量化计算时是否会利用SIMD?
有关R中矢量化"的更多信息
我不是Julia用户,但是BogumiłKamiński展示了该语言的令人印象深刻的功能: loop融合.朱莉娅可以做到这一点,因为正如他指出的那样,朱莉娅中的矢量化是在朱莉娅中实现的",而不是在语言之外.
这揭示了R的向量化的缺点:速度通常是以内存使用为代价的.我并不是说Julia不会遇到这个问题(因为我不使用它,我也不知道),但是对于R来说绝对是正确的.
另一个示例是R的回收规则或依赖该规则的任何计算.例如,当您用 为了处理许多临时结果的记忆效应,R具有称为垃圾收集"的复杂机制.它有帮助,但是如果您在代码中的某个地方生成了一些非常大的临时结果,则内存仍然会爆炸.函数 当我开始讨论向量化"的副作用时,我可能在本次编辑中脱节.小心使用. Based on what I've read before, vectorization is a form of parallelization known as SIMD. It allows processors to execute the same instruction (such as addition) on an array simultaneously. However, I got confused when reading The Relationship between Vectorized and Devectorized Code regarding Julia's and R's vectorization performance. The post claims that devectorized Julia code (via loops) is faster than the vectorized code in both Julia and R, because: It claims that R turns vectorized code, written in R, into devectorized code in C. If vectorization is faster (as a form of parallelization), why would R devectorize the code and why is that a plus? "Vectorization" in R, is a vector processing in R's interpreter's view. Take the function Not just the Simply put: whenever R calls some compiled code for a loop, it is a "vectorization". If you wonder why this kind of "vectorization" is useful, it is because a loop written by a compiled language is faster than a loop written in an interpreted language. The C loop is translated to machine language that a CPU can understand. However, if a CPU wants to execute an R loop, it needs R's interpreter's help to read it, iteration by iteration. This is like, if you know Chinese (the hardest human language), you can respond to someone speaking Chinese to you faster; otherwise, you need a translator to first translator Chinese to you sentence after sentence in English, then you respond in English, and the translator make it back to Chinese sentence by sentence. The effectiveness of communication is largely reduced. Be aware that "vectorization" in R is just an analogy to SIMD but not a real one. A real SIMD uses CPU's vector registers for computations hence is a true parallel computing via data parallelism. R is not a language where you can program CPU registers; you have to write compiled code or assembly code for that purpose. R's "vectorization" does not care how a loop written in a compiled language is really executed; after all that is beyond R's interpreter's knowledge. Regarding whether these compiled code will be executed with SIMD, read Does R leverage SIMD when doing vectorized calculations? More on "vectorization" in R I am not a Julia user, but Bogumił Kamiński has demonstrated an impressive feature of that language: loop fusion. Julia can do this, because, as he points out, "vectorization in Julia is implemented in Julia", not outside the language. This reveals a downside of R's vectorization: speed often comes at a price of memory usage. I am not saying that Julia won't have this problem (as I don't use it, I don't know), but this is definitely true for R. Here is an example: Fastest way to compute row-wise dot products between two skinny tall matrices in R. Another example is R's recycling rule or any computations relying on such rule. For example, when you add a scalar To deal with the memory effects of many temporary results, R has a sophisticated mechanism called "garbage collection". It helps, but memory can still explode if you generate some really big temporary result somewhere in your code. A good example is the function I might have been off-topic in this edit, as I begin to discuss the side effect of "vectorization". Use it with care. 这篇关于术语“向量化"是否在使用中?在不同背景下意味着不同的事物?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!A + a
向矩阵A
中添加标量a
时,实际上发生的是a
首先被复制为与A
尺寸相同的矩阵B
,即B <- matrix(a, nrow(A), ncol(A))
,然后计算两个矩阵之间的加法:A + B
.显然,临时矩阵B
的生成是不希望的,但是对不起,除非您为A + a
编写自己的C函数并在R中调用它,否则您无法做得更好.这被描述为只有在BogumiłKamiński的答案中,融合>才能实现融合.outer
是一个很好的例子.使用此功能,我已经写了很多答案,但是它对内存不友好.c(crossprod(x, y))
比sum(x * y)
更好.cumsum
as an example. On entry, R interpreter sees that a vector x
is passed into this function. However, the work is then passed to C language that R interpreter can not analyze / track. While C is doing work, R is just waiting. By the time that R's interpreter comes back to work, a vector has been processed. So in R's view, it has issued a single instruction but processed a vector. This is an analogy to the concept of SIMD - "single instruction, multiple data".cumsum
function that takes a vector and returns a vector is seen as "vectorization" in R, functions like sum
that takes a vector and returns a scalar is also a "vectorization".x <- runif(1e+7)
## R loop
system.time({
sumx <- 0
for (x0 in x) sumx <- sumx + x0
sumx
})
# user system elapsed
# 1.388 0.000 1.347
## C loop
system.time(sum(x))
# user system elapsed
# 0.032 0.000 0.030
rowSums(A * B)
is a "vectorization" in R, as both "*"
and rowSums
are coded in C language as a loop. However, R can not fuse them into a single C loop to avoid generating the temporary matrix C = A * B
into RAM.a
to a matrix A
by A + a
, what really happens is that a
is first replicated to be a matrix B
that has the same dimension with A
, i.e., B <- matrix(a, nrow(A), ncol(A))
, then an addition between two matrices are calculated: A + B
. Clearly the generation of the temporary matrix B
is undesired, but sorry, you can't do it better unless you write your own C function for A + a
and call it in R. This is described as "such a fusion is possible only if explicitly implemented" in Bogumił Kamiński's answer.outer
. I have written many answers using this function, but it is particularly memory-unfriendly.c(crossprod(x, y))
is better than sum(x * y)
.