问题描述
全部.
在尝试解决一些编程测验时: https://www.hackerrank.com/challenges/missing-numbers ,我遇到了空间泄漏.
While trying to solve some programming quiz:https://www.hackerrank.com/challenges/missing-numbers, I came across with space leak.
主要功能是difference
,它实现了多组差异.我发现列表':'和三元组(,,)保留在堆中-hT选项分析.但是,只有大列表是difference
的两个参数,并且随着difference
继续尾递归而缩小.但是列表所消耗的内存随着程序的运行而不断增加.
Main function is difference
, which implements multi-set difference.I've found out that List ':' and Triples (,,) kept on heapswith -hT option profiling. However, only big lists are difference
'stwo arguments, and it shrinks as difference
keeps on tail recursion.But the memory consumed by lists keeps increasing as program runs.
Triples是临时数组结构,用于簿记多集的每个元素的计数.但是三倍消耗的内存也持续增长,我找不到原因.
Triples is ephemeral array structure, used for bookkeeping the count of multiset's each element. But the memory consumed by triples alsokeeps increasing, and I cannot find out why.
尽管我在stackoverflow中浏览了类似的空间泄漏"问题,我不明白这个主意.当然,我有很多东西要学习.
Though I've browsed similar 'space leak' questions in stackoverflow,I couldn't grasp the idea. Surely I have much to study.
我感谢您的任何评论.谢谢.
I appreciate any comments. Thank you.
p.s)可执行文件是使用-O2开关编译的.
p.s) executable is compiled with -O2 switch.
$ ./difference -hT < input04.txt
Stack space overflow: current size 8388608 bytes.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
.
import Data.List
import Data.Array
-- array (non-zero-count, start-offset, array_data)
array_size=101
myindex :: Int -> Int -> Int
myindex key offset
| key >= offset = key - offset
| otherwise = key - offset + array_size
mylookup x (_,offset,arr) = arr ! idx
where idx = myindex x offset
addOrReplace :: Int -> Int -> (Int, Int, Array Int (Int,Int)) -> (Int, Int, Array Int (Int,Int))
addOrReplace key value (count,offset,arr) = (count', offset, arr // [(idx,(key,value))])
where idx = myindex key offset
(_,prev_value) = arr ! idx
count' = case (prev_value, value) of
(0,0) -> count
(0,_) -> count + 1
(_,0) -> count - 1
otherwise -> count
difference :: (Int,Int,Array Int (Int,Int)) -> [Int] -> [Int] -> [Int]
difference (count,offset,arr) [] []
| count == 0 = []
| otherwise = [ k | x <- [0..array_size-1], let (k,v) = (arr ! x), v /= 0]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = addOrReplace x (v + 1) m
difference m [] (y:ys) = difference new_m [] ys
where (_,v) = mylookup y m
new_m = if v == 0
then m
else addOrReplace y (v - 1) m
main = do
n <- readLn :: IO Int
pp <- getLine
m <- readLn :: IO Int
qq <- getLine
let p = map (read :: String->Int) . words $ pp
q = map (read :: String->Int) . words $ qq
startArray = (0,head q, array (0,100) [(i,(0,0)) | i <- [0..100]] )
putStrLn . unwords . map show . sort $ difference startArray q p
感谢Carl的建议,我确定了价值和Array.我附上了堆图.
I seq'ed value and Array thanks to Carl's advice.I attach heap diagram.
[原始堆分析][] 1
[original heap profiling][]1
[在对值v
进行排序之后]
[after seq'ing value v
]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
[对值v
和Array
进行序列化之后]
[after seq'ing value v
and Array
]
difference m (x:xs) y = new_m `seq` difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
推荐答案
我发现此代码存在三个主要问题.
I see three main problems with this code.
首先(不是内存使用的原因,但肯定是性能普遍下降的原因)Array
对于此用例来说是可怕的.当更新为O(n)时,O(1)查找是无用的.
First (and not the cause of the memory use, but definitely the cause of generally poor performance) Array
is horrible for this use case. O(1) lookups are useless when updates are O(n).
说起来,当difference
在其第一个输入上循环时,并不会强制存储在Array
中的值.它们是thunk,包含指向数组先前版本中未经评估的查找的指针.您可以通过多种方式确保在更新数组的同时求值.当difference
遍历其第二个输入时,实际上是通过将值与0进行比较而意外地完成的.
Speaking of, the values being stored in the Array
aren't forced while difference
is looping over its first input. They are thunks containing pointers to an unevaluated lookup in the previous version of the array. You can ensure that the value is evaluated at the same time the array is updated, in a variety of ways. When difference
loops over its second input, it does this accidentally, in fact, by comparing the value against 0.
第三,difference
在遍历其第一个参数时甚至不强制评估正在创建的新数组.不需要在循环的那部分中对旧数组进行求值.
Third, difference
doesn't even force the evaluation of the new arrays being created while traversing its first argument. Nothing requires the old array to be evaluated during that portion of the loop.
后两个问题都需要解决,以解决空间泄漏问题.第一个问题不会导致空间泄漏,只是开销比所需的高得多.
Both of those latter issues need to be resolved to fix the space leak. The first issue doesn't cause a space leak, just much higher overheads than needed.
这篇关于Haskell太空泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!