本文介绍了为什么这个总理测试如此缓慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这段代码摘自Haskell Road to Logic,Math and Programming一书。它实现了eratosthenes算法的筛选并解决了项目欧拉问题10.

  sieve :: [Integer]  - > [整数] 
筛号(0:xs)=筛号xs
筛号(n:xs)= n:筛号(标记xs 1 n)
其中
mark :: [Integer ] - >整数 - >整数 - > [整数]
标记(y:ys)k m | k == m = 0:(mark ys 1 m)
|否则= y:(标记ys(k + 1)m)

素数:: [整数]
素数=筛子[2 ..]

- Project Euler#10
main = print $ sum $ takeWhile(< 2000000)primes

事实上,它的运行速度甚至比天真的素数测试还要慢。
有人可以解释这种行为吗?



我怀疑它与迭代mark函数列表中的每个元素有关。



Thanks。

解决方案

您正在使用此算法构建二次数未评估的thunk。该算法非常依赖于懒惰,这也是它不能缩放的原因。



让我们来看看它是如何工作的,希望能够让问题变得明显。简单地说,假设我们想要 print primes ad infinitum的元素,也就是说我们想要评估每个单元格一个接一个地在列表中。 primes 定义为:

 素数:: [Integer] 
primes = sieve [2 ..]

由于2不是0,所以第二个定义 sieve 适用,并且2被添加到素数列表中,并且列表的其余部分是未评估的thunk(我使用 tail 而不是 n:xs 在 sieve 中为 xs ,所以 tail 实际上并未被调用,并且不会在下面的代码中添加任何开销; mark $ b

  primes = 2:筛号(mark(tail [2 ..])) 1 2)

现在我们需要第二个素数元素。因此,我们通过代码(读者的练习)并结束:

  primes = 2:3:sieve( mark(尾(mark(tail [2 ..])1 2))1 3)

程序再次,我们要评估下一个素数...

  primes = 2:3:5:sieve(mark(tail (尾巴(标记(尾巴[2 ..])1 2))1 3)))1 5)

这看起来像LISP,但我离题了......开始看到问题了?对于 primes 列表中的每个元素,必须评估越来越多的标记应用程序堆栈。换句话说,对于列表中的每个元素,通过评估每个标记应用程序中的每个元素,都必须检查该元素是否被前面的任何素数标记。叠加。因此,对于 n〜= 2000000 ,Haskell运行时必须调用函数,调用堆栈的深度大约为......我不知道,137900( let n = 2e6 in n / log n 给出一个下界)?类似的东西。这可能是导致减速的原因;也许可以告诉你更多(我没有计算机与Haskell和一个图形用户界面现在)。

Eratosthenes的筛选工作在像C这样的语言中的原因是:


  1. 您没有使用无限列表。

  2. 由于(1),您可以在继续下一步之前标记整个数组 n ,完全没有调用堆栈开销。


This code was taken from the book "Haskell Road to Logic, Math and Programming". It implements sieve of eratosthenes algorithm and solves Project Euler Problem 10.

sieve :: [Integer] -> [Integer]
sieve (0 : xs) = sieve xs
sieve (n : xs) = n : sieve (mark xs 1 n)
  where
    mark :: [Integer] -> Integer -> Integer -> [Integer]
    mark (y:ys) k m | k == m = 0 : (mark ys 1 m)
                    | otherwise = y : (mark ys (k+1) m)

primes :: [Integer]
primes = sieve [2..]

-- Project Euler #10
main = print $ sum $ takeWhile (< 2000000) primes

Actually it runs even slower then the naive prime test.Can someone explain this behaivour?

I suspect it has something to do with iterating each element in the list in the mark function.

Thanks.

解决方案

You are building up a quadratic number of unevaluated thunks using this algorithm. The algorithm relies on laziness so heavily, that this also is the reason why it doesn't scale.

Let's walk through how it works, which hopefully should make the problem apparent. For simplicitly, let's say that we want to print the elements of primes ad infinitum, i.e. we want to evaluate each cell in the list one after the other. primes is defined as:

primes :: [Integer]
primes = sieve [2..]

Since 2 isn't 0, the second definition of sieve applies, and 2 is added to the list of primes, and the rest of the list is an unevaluated thunk (I use tail instead of the pattern match n : xs in sieve for xs, so tail isn't actually being called, and doesn't add any overhead in the code below; mark is actually the only thunked function):

primes = 2 : sieve (mark (tail [2..]) 1 2)

Now we want the second primes element. So, we walk through the code (exercise for the reader) and end up with:

primes = 2 : 3 : sieve (mark (tail (mark (tail [2..]) 1 2)) 1 3)

Same procedure again, we want to evaluate the next prime...

primes = 2 : 3 : 5 : sieve (mark (tail (tail (mark (tail (mark (tail [2..]) 1 2)) 1 3))) 1 5)

This is starting to look like LISP, but I digress... Starting to see the problem? For each element in the primes list, an increasingly large thunk of stacks of mark applications have to be evaluated. In other words, for each element in the list, there has to be a check for whether that element is marked by any of the preceding primes, by evaluating each mark application in the stack. So, for n~=2000000, the Haskell runtime has to call functions resulting in a call stack with a depth of about ... I don't know, 137900 (let n = 2e6 in n / log n gives a lower bound)? Something like that. This is probably what causes the slow-down; maybe vacuum can tell you more (I don't have a computer with both Haskell and a GUI right now).

The reason why the sieve of Eratosthenes works in languages like C is that:

  1. You aren't using an infinite list.
  2. Because of (1), you can mark the whole array before continuing with the next n, resulting in no call stack overheads at all.

这篇关于为什么这个总理测试如此缓慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 07:14