问题描述
我的目标是有一个平行的foldr函数。起初,看起来是相当直接的实现,而这正是我想到的:
首先根据数字将输入列表分成多个分区的
核心( numCapabilities
)。然后将foldr应用于每个分区,其中
将导致每个分区的折叠值列表。然后在该列表上再次执行
foldr以获得最终值。
listChunkSize = numCapabilities
chunk n [] = []
chunk n xs = ys:chunk n zs
其中(ys,zs)= splitAt n xs
parfoldr fz [] = z
parfoldr fz xs = res
其中
parts = chunk listChunkSize xs
partsRs = map(foldr fz)parts` using` parList rdeepseq
res = foldr fz partsRs
上面的代码不起作用,因为显然
foldr,<$ c $ (a - > b - > b) - > b - > [a] - > b ,意味着输入列表
类型(可以)与累加器和结果类型不同。
例如,
foldr(+)0 [1..10]
=>列表类型=累加器类型(整数)
2) foldr(\i acc - >(i> 5)& acc)True [1。 .10]
=>列表类型(整数)!
=累加器类型(Bool)
因此,查看上面的代码,地图会生成一个类型列表 b code>
然后作为参数传递给第二个foldr。但第二个
foldr接受 a
类型的列表。所以,这是行不通的。
丑陋的解决方案是为parfoldr的
提供一个不同的类型签名。
parfoldr ::(NFData a)=> (a - > a - > a) - > a - > [a] - > a
这会起作用,但它不完全等同于foldr。上面的例子
1会做得很好,但不会是例子2.
所以,问题1是:如何定义parfoldr具有相同的类型签名
作为foldr?
比较2折:
输入= [1..1000000]
seqfold = foldr(+)0
parfold = parfoldr(+)0
。在一个双核心机器上的时间:
(无线标志)
$ ./test
seqfold :4.99s
parfold:25.16s
( - 线程标志开启)
$ ./test
seqfold:5.32s
parfold:25.55s
$ ./test + RTS -N1
seqfold:5.32s
parfold:25.53s
$ ./test + RTS -N2
seqfold:3.48s
parfold:3.68s
$ ./test + RTS -N3
seqfold:3.57s
parfold:2.36s
$ ./test + RTS -N4
seqfold:3.03s
parfold: 1.70s
这些测量结果显示: 当core数量增加时,foldr似乎会降低运行时间。 parfold为N => 3提供了更好的运行时间。 ul> 任何有待改进的建议和想法都会受到赞赏:) My aim is to have a parallel foldr function. At first, it seemedrather straight forward to achieve and this is what I had in mind: First break up the input list into partitions based on the number ofcores ( The above code does not work because obviously the definition offoldr, For example, 1) 2) So, looking at my code above, the map will generate a list of type An ugly solution would be to provide a different type signature forthe parfoldr, e.g. This will work but then it is not exactly equivalent to foldr. Example1 above will do just fine, but not example 2.So, question 1 is: how to define parfoldr to have same type signatureas foldr? Comparing the 2 folds: I get the foll. times on a dual core machine:(no -threaded flag) (-threaded flag on) Observations from these measurements: foldr seems to give lower runtime when num of cores is increased.why is that? parfold gives better runtimes for N => 3. Any suggestions and ideas for improvement is appreciated :) 这篇关于比这更通用的parfoldr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
为什么会这样?
foldr
通常不是可并行化的,因为它的接口允许顺序依赖关系。为了能够按照您描述的方式重新排列计算,您需要将自己限制为具有标识元素的关联运算符。这被称为,您已经实现的是基本上是一个平行的。numCapabilities
). Then apply foldr to each partition, whichwill result in a list of folded values for each partition. Then do afoldr again on that list to obtain the final value. listChunkSize = numCapabilities
chunk n [] = []
chunk n xs = ys : chunk n zs
where (ys,zs) = splitAt n xs
parfoldr f z [] = z
parfoldr f z xs = res
where
parts = chunk listChunkSize xs
partsRs = map (foldr f z) parts `using` parList rdeepseq
res = foldr f z partsRs
(a -> b -> b) -> b -> [a] -> b
, implies that the input listtype is (well, can be) different from the accumulator and result type.foldr (+) 0 [1..10]
=> list type = accumulator type (Integer)foldr (\i acc -> (i>5) && acc) True [1..10]
=> list type (Integer) != accumulator type (Bool)b
which is then passed as argument to the second foldr. But the secondfoldr accepts list of type a
. So, that won't work.parfoldr :: (NFData a) => (a -> a -> a) -> a -> [a] -> a
input = [1..1000000]
seqfold = foldr (+) 0
parfold = parfoldr (+) 0
$ ./test
seqfold: 4.99s
parfold: 25.16s
$ ./test
seqfold: 5.32s
parfold: 25.55s
$ ./test +RTS -N1
seqfold: 5.32s
parfold: 25.53s
$ ./test +RTS -N2
seqfold: 3.48s
parfold: 3.68s
$ ./test +RTS -N3
seqfold: 3.57s
parfold: 2.36s
$ ./test +RTS -N4
seqfold: 3.03s
parfold: 1.70s
foldr
is not in general parallelizable, as its interface allows sequential dependencies. In order to be able to rearrange the computations in the way you described you'll need to limit yourself to associative operators with an identity element. This is known as a monoid, and what you've implemented is essentially a parallel mconcat
.