问题描述
在阅读了Okasaki的纯功能数据结构中有关持久性的文章并仔细阅读了他关于单链接列表(这是Haskell列表的实现方式)的示例之后,让我们想知道Data.List
的inits
和tails
...
After reading the passage about persistence in Okasaki's Purely Functional Data Structures and going over his illustrative examples about singly linked lists (which is how Haskell's lists are implemented), I was left wondering about the space complexities of Data.List
's inits
and tails
...
在我看来
-
tails
的空间复杂度在其参数长度上为 linear ,并且 -
inits
的空间复杂度在其参数长度上为二次方,
- the space complexity of
tails
is linear in the length of its argument, and - the space complexity of
inits
is quadratic in the length of its argument,
但简单的基准测试则相反.
but a simple benchmark indicates otherwise.
使用tails
,可以共享原始列表.计算tails xs
仅在于遍历列表xs
并创建指向该列表中每个元素的新指针.无需在内存中重新创建xs
的一部分.
With tails
, the original list can be shared. Computing tails xs
simply consists in walking along list xs
and creating a new pointer to each element of that list; no need to recreate part of xs
in memory.
相反,由于inits xs
的每个元素以不同的方式结尾",因此不能进行此类共享,并且必须从头开始在内存中重新创建xs
的所有可能的前缀.
In contrast, because each element of inits xs
"ends in a different way", there can be no such sharing, and all the possible prefixes of xs
must be recreated from scratch in memory.
下面的简单基准测试表明,这两个函数之间的内存分配没有太大区别:
The simple benchmark below shows there isn't much of a difference in memory allocation between the two functions:
-- Main.hs
import Data.List (inits, tails)
main = do
let intRange = [1 .. 10 ^ 4] :: [Int]
print $ sum intRange
print $ fInits intRange
print $ fTails intRange
fInits :: [Int] -> Int
fInits = sum . map sum . inits
fTails :: [Int] -> Int
fTails = sum . map sum . tails
用
编译我的Main.hs
文件后
ghc -prof -fprof-auto -O2 -rtsopts Main.hs
并运行
./Main +RTS -p
Main.prof
文件报告以下内容:
COST CENTRE MODULE %time %alloc
fInits Main 60.1 64.9
fTails Main 39.9 35.0
分配给fInits
的内存和分配给fTails
的内存具有相同的数量级...哼...
The memory allocated for fInits
and that allocated for fTails
have the same order of magnitude... Hum...
- 关于
tails
(线性)和inits
(二次)的空间复杂度的结论是否正确? - 如果是这样,为什么GHC会为
fInits
和fTails
分配大致相同的内存? 列表融合是否与此有关? - 还是我的基准测试存在缺陷?
- Are my conclusions about the space complexities of
tails
(linear) andinits
(quadratic) correct? - If so, why does GHC allocate roughly as much memory for
fInits
andfTails
? Does list fusion have something to do with this? - Or is my benchmark flawed?
推荐答案
Haskell报告中inits
的实现,与基于4.7.0.1(GHC 7.8.3)之前使用的实现相同或几乎相同.太慢了.特别是fmap
应用程序是递归堆叠的,因此强制结果的连续元素变得越来越慢.
The implementation of inits
in the Haskell Report, which is identical to or nearly identical to implementations used up to base 4.7.0.1 (GHC 7.8.3) is horribly slow. In particular, the fmap
applications stack up recursively, so forcing successive elements of the result gets slower and slower.
inits [1,2,3,4] = [] : fmap (1:) (inits [2,3,4])
= [] : fmap (1:) ([] : fmap (2:) (inits [3,4]))
= [] : [1] : fmap (1:) (fmap (2:) ([] : fmap (3:) (inits [4])))
....
Bertram Felgenhauer探索的最简单的渐近最优实现是基于应用take
并带有依次更大的参数:
The simplest asymptotically optimal implementation, explored by Bertram Felgenhauer, is based on applying take
with successively larger arguments:
inits xs = [] : go (1 :: Int) xs where
go !l (_:ls) = take l xs : go (l+1) ls
go _ [] = []
Felgenhauer使用私有的非融合版本的take
可以从中获得一些额外的性能,但是仍然不如预期的快.
Felgenhauer was able to eke some extra performance out of this using a private, non-fusing version of take
, but it was still not as fast as it could be.
在大多数情况下,以下非常简单的实现明显更快:
The following very simple implementation is significantly faster in most cases:
inits = map reverse . scanl (flip (:)) []
在某些奇怪的情况下(例如map head . inits
),这种简单的实现在渐近上不是最佳的.因此,我基于Chris Okasaki的Banker的队列,使用相同的技术编写了一个版本,该版本既渐近最优,又几乎一样快. Joachim Breitner主要通过使用严格的scanl'
而不是通常的scanl
对其进行了进一步优化,GHC 7.8.4中引入了此实现. inits
现在可以在O(n)时间内产生结果的书脊.强制整个结果需要O(n ^ 2)时间,因为在所有初始段之间都不能共享任何问题.如果您真的想以极快的速度inits
和tails
,最好的选择是使用Data.Sequence
. Louis Wasserman的实现神奇.另一种可能是使用Data.Vector
—大概是对此类事物使用切片.
In some weird corner cases (like map head . inits
), this simple implementation is asymptotically non-optimal. I therefore wrote a version using the same technique, but based on Chris Okasaki's Banker's queues, that is both asymptotically optimal and nearly as fast. Joachim Breitner optimized it further, primarily by using a strict scanl'
rather than the usual scanl
, and this implementation got into GHC 7.8.4. inits
can now produce the spine of the result in O(n) time; forcing the entire result requires O(n^2) time because none of the conses can be shared among the different initial segments. If you want really absurdly fast inits
and tails
, your best bet is to use Data.Sequence
; Louis Wasserman's implementation is magical. Another possibility would be to use Data.Vector
—it presumably uses slicing for such things.
这篇关于初始和尾部的空间复杂性是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!