本文介绍了比这更通用的parfoldr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是有一个平行的foldr函数。起初,看起来
是相当直接的实现,而这正是我想到的:

首先根据数字将输入列表分成多个分区的
核心( numCapabilities )。然后将foldr应用于每个分区,其中
将导致每个分区的折叠值列表。然后在该列表上再次执行
foldr以获得最终值。

  listChunkSize = numCapabilities 

chunk n [] = []
chunk n xs = ys:chunk n zs
其中(ys,zs)= splitAt n xs

parfoldr fz [] = z
parfoldr fz xs = res
其中
parts = chunk listChunkSize xs
partsRs = map(foldr fz)parts` using` parList rdeepseq
res = foldr fz partsRs

上面的代码不起作用,因为显然
foldr,<$ c $ (a - > b - > b) - > b - > [a] - > b ,意味着输入列表
类型(可以)与累加器和结果类型不同。



例如,

foldr(+)0 [1..10] =>列表类型=累加器类型(整数)

2) foldr(\i acc - >(i> 5)& acc)True [1。 .10] =>列表类型(整数)!
=累加器类型(Bool)

因此,查看上面的代码,地图会生成一个类型列表 b code>
然后作为参数传递给第二个foldr。但第二个
foldr接受 a 类型的列表。所以,这是行不通的。



丑陋的解决方案是为parfoldr的
提供一个不同的类型签名。
parfoldr ::(NFData a)=> (a - > a - > a) - > a - > [a] - > a



这会起作用,但它不完全等同于foldr。上面的例子
1会做得很好,但不会是例子2.
所以,问题1是:如何定义parfoldr具有相同的类型签名
作为foldr?



比较2折:

 输入= [1..1000000] 
seqfold = foldr(+)0
parfold = parfoldr(+)0

。在一个双核心机器上的时间:
(无线标志)

  $ ./test 
seqfold :4.99s
parfold:25.16s

( - 线程标志开启)

  $ ./test 
seqfold:5.32s
parfold:25.55s
$ ./test + RTS -N1
seqfold:5.32s
parfold:25.53s
$ ./test + RTS -N2
seqfold:3.48s
parfold:3.68s
$ ./test + RTS -N3
seqfold:3.57s
parfold:2.36s
$ ./test + RTS -N4
seqfold:3.03s
parfold: 1.70s

这些测量结果显示:


  • 当core数量增加时,foldr似乎会降低运行时间。
    为什么会这样?


  • parfold为N => 3提供了更好的运行时间。 ul>

    任何有待改进的建议和想法都会受到赞赏:)

    解决方案

    foldr 通常不是可并行化的,因为它的接口允许顺序依赖关系。为了能够按照您描述的方式重新排列计算,您需要将自己限制为具有标识元素的关联运算符。这被称为,您已经实现的是基本上是一个平行的。


    My aim is to have a parallel foldr function. At first, it seemedrather straight forward to achieve and this is what I had in mind:

    First break up the input list into partitions based on the number ofcores (numCapabilities). Then apply foldr to each partition, whichwill result in a list of folded values for each partition. Then do afoldr again on that list to obtain the final value.

        listChunkSize = numCapabilities
    
        chunk n [] = []
        chunk n xs = ys : chunk n zs
          where (ys,zs) = splitAt n xs
    
        parfoldr f z [] = z
        parfoldr f z xs = res
          where
                parts = chunk listChunkSize xs
                partsRs = map (foldr f z) parts `using` parList rdeepseq
                res = foldr f z partsRs
    

    The above code does not work because obviously the definition offoldr, (a -> b -> b) -> b -> [a] -> b, implies that the input listtype is (well, can be) different from the accumulator and result type.

    For example,

    1) foldr (+) 0 [1..10] => list type = accumulator type (Integer)

    2) foldr (\i acc -> (i>5) && acc) True [1..10] => list type (Integer) != accumulator type (Bool)

    So, looking at my code above, the map will generate a list of type bwhich is then passed as argument to the second foldr. But the secondfoldr accepts list of type a. So, that won't work.

    An ugly solution would be to provide a different type signature forthe parfoldr, e.g.parfoldr :: (NFData a) => (a -> a -> a) -> a -> [a] -> a

    This will work but then it is not exactly equivalent to foldr. Example1 above will do just fine, but not example 2.So, question 1 is: how to define parfoldr to have same type signatureas foldr?

    Comparing the 2 folds:

        input = [1..1000000]
        seqfold = foldr (+) 0
        parfold = parfoldr (+) 0
    

    I get the foll. times on a dual core machine:(no -threaded flag)

        $ ./test
        seqfold: 4.99s
        parfold: 25.16s
    

    (-threaded flag on)

        $ ./test
        seqfold: 5.32s
        parfold: 25.55s
        $ ./test +RTS -N1
        seqfold: 5.32s
        parfold: 25.53s
        $ ./test +RTS -N2
        seqfold: 3.48s
        parfold: 3.68s
        $ ./test +RTS -N3
        seqfold: 3.57s
        parfold: 2.36s
        $ ./test +RTS -N4
        seqfold: 3.03s
        parfold: 1.70s
    

    Observations from these measurements:

    • foldr seems to give lower runtime when num of cores is increased.why is that?

    • parfold gives better runtimes for N => 3.

    Any suggestions and ideas for improvement is appreciated :)

    解决方案

    foldr is not in general parallelizable, as its interface allows sequential dependencies. In order to be able to rearrange the computations in the way you described you'll need to limit yourself to associative operators with an identity element. This is known as a monoid, and what you've implemented is essentially a parallel mconcat.

    这篇关于比这更通用的parfoldr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-18 21:54