sorting - 内存有效算法，用于`take n(sort xs)`(“sorted prefix”)问题

我想从懒惰列表中选取n个最大元素。

我听说在Data.List.sort中实现的mergesort很懒，它不会产生超出必要数量的元素。就比较而言，这可能是正确的，但在内存使用方面肯定不是这种情况。以下程序说明了该问题:

{-# LANGUAGE ScopedTypeVariables #-}

module Main where

import qualified Data.Heap as Heap
import qualified Data.List as List

import System.Random.MWC
import qualified Data.Vector.Unboxed as Vec

import System.Environment

limitSortL n xs = take n (List.sort xs)
limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs)

main = do
  st <- create
  rxs :: [Int] <- Vec.toList `fmap` uniformVector st (10^7)

  args <- getArgs
  case args of
    ["LIST"] -> print (limitSortL 20 rxs)
    ["HEAP"] -> print (limitSortH 20 rxs)

  return ()

运行:

数据清单:

./lazyTest LIST + RTS -s
[-9223371438221280004，-9223369283422017686，-9223368296903201811，-9223365203042113113783，-9223364809100004863，-9223363058932210878，-9223362160334234021，-9223359019266180408，-9223358851531436915，-9223345045262962114，-9223343191568230,146922922342342922922922922342 9223331570472117335，-9223329558935830150，-9223329536207787831，-9223328937489459283]
堆中分配了2,059,921,192字节
GC期间复制了2,248,105,704字节
552,350,688字节最大驻留时间(5个样本)
3,390,456字节最大延迟
正在使用的总内存为1168 MB(由于碎片丢失了0 MB)

生成0:3772个集合，0个并行，1.44s，1.48s已用
第1代:经过5个集合，并行执行0次，0.90s，1.13s

初始化时间0.00s(经过0.00s)
MUT时间0.82秒(经过0.84秒)
GC时间2.34秒(经过2.61秒)
退出时间0.00s(经过0.00s)
总时间3.16秒(经过3.45秒)

％GC时间74.1％(经过75.7％)

每MUT秒分配2,522,515,156字节的速率

生产力占总用户的25.9％，占总使用时间的23.7％

数据堆:

./lazyTest HEAP + RTS -s
[-9223371438221280004，-9223369283422017686，-9223368296903201811，-9223365203042113113783，-9223364809100004863，-9223363058932210878，-9223362160334234021，-9223359019266180408，-9223358851531436915，-9223345045262962114，-9223343191568230,146922922342342922922922922342 9223331570472117335，-9223329558935830150，-9223329536207787831，-9223328937489459283]
堆中分配的177,559,536,928字节
GC期间复制了237,093,320字节
80,031,376字节最大驻留时间(2个样本)
最大745,368字节斜率
正在使用的总内存为78 MB(由于碎片丢失了0 MB)

生成0:338539个集合，并行0个，1.24s，1.31s
生成1:2个集合，0个并行，0.00s，0.00s已逝

初始化时间0.00s(经过0.00s)
MUT时间35.24秒(经过35.46秒)
GC时间1.24秒(经过1.31秒)
退出时间0.00s(经过0.00s)
总时间36.48秒(经过36.77秒)

％GC时间3.4％(已用3.6％)

每个MUT秒的分配速率5,038,907,812字节

生产力，占总用户的96.6％，占总使用时间的95.8％

显然limitSortL快得多，但是它也非常占用内存。在较大的列表中，它命中了RAM大小。

是否有一种更快的算法可以解决这个问题，而这并不意味着内存不足？

编辑:澄清:我使用堆包中的Data.Heap，但我没有尝试过堆包。

最佳答案

因此，我实际上已经设法解决了这个问题。这个想法是放弃花哨的数据结构，然后手工工作；-)
本质上，我们将输入列表分成多个块，对它们进行排序，然后折叠[[Int]]列表，在每一步中选择n最小的元素。
棘手的部分是以正确的方式将累加器与已排序的块合并。我们必须使用seq，否则懒惰会叮咬您，结果仍然需要大量内存才能计算。另外，我将merge与take n混合在一起，只是为了进行更多优化。这是整个程序，以及先前的尝试:

{-# LANGUAGE ScopedTypeVariables, PackageImports #-}
module Main where

import qualified Data.List as List
import qualified Data.List.Split as Split
import qualified "heaps" Data.Heap as Heap -- qualified import from "heaps" package

import System.Random.MWC
import qualified Data.Vector.Unboxed as Vec

import System.Environment

limitSortL n xs = take n (List.sort xs)
limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs)
takeSortMerge n inp = List.foldl'
                        (\acc lst -> (merge n acc (List.sort lst)))
                        [] (Split.splitEvery n inp)
    where
     merge 0 _ _ = []
     merge _ [] xs = xs
     merge _ ys [] = ys
     merge f (x:xs) (y:ys) | x < y = let tail = merge (f-1) xs (y:ys) in tail `seq` (x:tail)
                           | otherwise = let tail = merge (f-1) (x:xs) ys in tail `seq` (y:tail)


main = do
  st <- create

  let n1 = 10^7
      n2 = 20

  rxs :: [Int] <- Vec.toList `fmap` uniformVector st (n1)

  args <- getArgs

  case args of
    ["LIST"] ->  print (limitSortL n2 rxs)
    ["HEAP"] ->  print (limitSortH n2 rxs)
    ["MERGE"] -> print (takeSortMerge n2 rxs)
    _ -> putStrLn "Nothing..."

  return ()

运行时性能，内存消耗，GC时间:

清单3.96s 1168 MB 75％
HEAP 35.29s 78 MB 3.6％
合并1.00s 78 MB 3.0％
只是rxs 0.21s 78 MB 0.0％-只是评估随机 vector