list - 在Haskell中即时减少 list

假设我有一个函数f，它接受一些输入并产生一个数字。在f函数中，根据输入创建一个列表，然后将其缩减(例如，使用foldl' g)以产生最终的输出编号。因为毕竟要减少中间列表，所以可以在不表达中间列表的情况下应用reduce函数g。此处的目标是限制用于存储(或表示，如果“存储”一个不太准确的单词的)列表的内存。

为了说明这一点，此函数foldPairProduct为中间列表使用O(N1 * N2)空间(由于表达和惰性求值而消耗的空间可能更复杂，但我认为它是成比例的或更差的)。这里N1, N2是两个输入列表的大小。

foldPairProduct :: (Num a, Ord a)  => (a -> a -> a) -> [a] -> [a] -> a
foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys]

逻辑的替代实现是foldPairProduct'，它占用O(2 * 2)空间。

foldPairProduct' :: Num a => (Maybe a -> Maybe a -> Maybe a) -> [a] -> [a] -> Maybe a
foldPairProduct' _ _ [] = Nothing
foldPairProduct' _ [] _ = Nothing
foldPairProduct' f (x:xs) (y:ys) =
  foldl1 f [Just $ x*y, foldPairProduct' f [x] ys, foldPairProduct' f xs [y],
            foldPairProduct' f xs ys]

对于foldCrossProduct而言，这种情况更加严重，其实现与foldPairProduct相似，不同之处在于它接受多个列表作为输入。对于中间列表，它的空间复杂度(仍然是命令式语言的含义)是O(N1 * N2 * ...* Nk)，其中k是[[a]]的长度。

foldCrossProduct :: Num a => (a -> a -> a) -> [[a]]  -> a
foldCrossProduct f xss = foldl1 f (crossProduct xss)

crossProduct :: Num a => [[a]] -> [a]
crossProduct [] = []
crossProduct (xs:[]) = xs
crossProduct (xs:xss) = [x * y | x <- xs, y <- crossProduct xss]

如果我们遵循foldPairProduct'的实现思想，那么空间复杂度将是k^2，这将大大节省空间。我的问题是:

我为一对列表实现了foldPairProduct'。但是，似乎要为任意数量的列表实现它并不是一件容易的事。

我不是要把Haskell与命令式语言进行比较，但是有没有一种实现将使用常量空间(或者换句话说，不表达所提及长度的中间列表)呢？也许Monad会有所帮助，但我对此很陌生。

编译器实际上是否发挥了魔力？就是说，它注意到列表是中间的并且要减少，并且确实找到了一种有效利用空间进行评估的方法。毕竟，这就是我认为惰性评估和编译器优化的目的。

欢迎任何评论。谢谢。

更新1

性能测试通过改变输入大小foldPairProduct以及观察GC复制的字节数，证实了对foldCrossProduct和N1, N2, N3的“空间复杂性”的分析。

性能测试打乱了对foldPairProduct'的分析，该分析令人惊讶地显示了N1 * N2甚至更糟的空间使用情况。这可能是由于对递归调用的评估效率低下。结果附在下面(ghc设置与Yuras相同)。

更新2

我从评论和答案中学到了一些更新的实验。对于foldPairProduct，正在使用的总内存与Daniel Fischer解释的空间复杂度一致。

对于foldCrossProduct，尽管Daniel的复杂度分析对我来说很有意义，但结果并未显示出线性内存使用情况。按照Daniel的建议，交换了x <- xs和y <- crossproduct ys，确实实现了线性空间复杂度。

对于foldCrossProduct (max) [[1..100],[1..n], [1..1000]]，n = 100、1000、10000、100000，使用的内存为2、2、3、14 MB。

foldPairProduct [1..n] [1..10000]

n = 100
  120,883,320 bytes allocated in the heap
   56,867,728 bytes copied during GC
      428,384 bytes maximum residency (50 sample(s))
       98,664 bytes maximum slop
            3 MB total memory in use (0 MB lost due to fragmentation)

n = 1000
 1,200,999,280 bytes allocated in the heap
   569,837,360 bytes copied during GC
       428,384 bytes maximum residency (500 sample(s))
        99,744 bytes maximum slop
             3 MB total memory in use (0 MB lost due to fragmentation)
n = 10000

  12,002,152,040 bytes allocated in the heap
   5,699,468,024 bytes copied during GC
         428,384 bytes maximum residency (5000 sample(s))
          99,928 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 100000

 120,013,672,800 bytes allocated in the heap
  56,997,625,608 bytes copied during GC
         428,384 bytes maximum residency (50000 sample(s))
          99,984 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

foldPairProduct [1..10000] [1..n]

n = 100

     121,438,536 bytes allocated in the heap
          55,920 bytes copied during GC
          32,408 bytes maximum residency (1 sample(s))
          19,856 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

n = 1000

   1,201,511,296 bytes allocated in the heap
         491,864 bytes copied during GC
          68,392 bytes maximum residency (1 sample(s))
          20,696 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

n = 10000

  12,002,232,056 bytes allocated in the heap
   5,712,004,584 bytes copied during GC
         428,408 bytes maximum residency (5000 sample(s))
          98,688 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 100000

 120,009,432,816 bytes allocated in the heap
  81,694,557,064 bytes copied during GC
       4,028,408 bytes maximum residency (10002 sample(s))
         769,720 bytes maximum slop
              14 MB total memory in use (0 MB lost due to fragmentation)

foldPairProduct [1..n] [1..n]

n = 100
 1,284,024 bytes allocated in the heap
    15,440 bytes copied during GC
    32,336 bytes maximum residency (1 sample(s))
    19,920 bytes maximum slop
         1 MB total memory in use (0 MB lost due to fragmentation)

n = 1000
 120,207,224 bytes allocated in the heap
     114,848 bytes copied during GC
      68,336 bytes maximum residency (1 sample(s))
      24,832 bytes maximum slop
           1 MB total memory in use (0 MB lost due to fragmentation)

n = 10000

  12,001,432,024 bytes allocated in the heap
   5,708,472,592 bytes copied during GC
         428,336 bytes maximum residency (5000 sample(s))
          99,960 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 100000
 1,200,013,672,824 bytes allocated in the heap
   816,574,713,664 bytes copied during GC
         4,028,336 bytes maximum residency (100002 sample(s))
           770,264 bytes maximum slop
                14 MB total memory in use (0 MB lost due to fragmentation)

foldCrossProduct(max)[[1..n]，[1..100]，[1..1000]]

n = 100
     105,131,320 bytes allocated in the heap
      38,697,432 bytes copied during GC
         427,832 bytes maximum residency (34 sample(s))
         209,312 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 1000
   1,041,254,480 bytes allocated in the heap
     374,148,224 bytes copied during GC
         427,832 bytes maximum residency (334 sample(s))
         211,936 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 10000
  10,402,479,240 bytes allocated in the heap
   3,728,429,728 bytes copied during GC
         427,832 bytes maximum residency (3334 sample(s))
         215,936 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

foldCrossProduct(max)[[1..100]，[1..n]，[1..1000]]

n = 100
     105,131,344 bytes allocated in the heap
      38,686,648 bytes copied during GC
         431,408 bytes maximum residency (34 sample(s))
         205,456 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 1000
   1,050,614,504 bytes allocated in the heap
     412,084,688 bytes copied during GC
       4,031,456 bytes maximum residency (53 sample(s))
       1,403,976 bytes maximum slop
              15 MB total memory in use (0 MB lost due to fragmentation)
n = 10000
    quit after over 1362 MB total memory in use (0 MB lost due to fragmentation)

foldPairProduct'[1..n] [1..n]

n = 100
 4,351,176 bytes allocated in the heap
    59,432 bytes copied during GC
    74,296 bytes maximum residency (1 sample(s))
    21,320 bytes maximum slop
         1 MB total memory in use (0 MB lost due to fragmentation)

n = 1000
 527,009,960 bytes allocated in the heap
  45,827,176 bytes copied during GC
     211,680 bytes maximum residency (1 sample(s))
      25,760 bytes maximum slop
           2 MB total memory in use (0 MB lost due to fragmentation)

最佳答案

foldPairProduct :: (Num a, Ord a)  => (a -> a -> a) -> [a] -> [a] -> a
foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys]

可以成为一个好记性的公民。第二个参数ys可以重复使用，因此在计算过程中必须将其完全存储在内存中，但是中间列表在被消耗时会延迟生成，因此仅贡献了恒定的内存量，从而使O(length ys)总体空间复杂化。当然，必须生成和使用length xs * length ys列表单元格，因此总体分配为O(length xs * length ys) [假设每个a值使用有界空间]。通过使用+RTS -A1M提供更大的分配区域，可以大大减少GC期间复制的字节数(从而减少GC所需的时间)，该数目从

3,717,333,376 bytes copied during GC

默认设置为

20,445,728 bytes copied during GC

以及GC time 4.88s和GC time 0.07s从xs == ys = [1 .. 10000] :: [Int]到f = (+)的时间。

但这取决于严格性分析器的工作-如果使用的类型是例如，它可以很好地工作Int是已知的，在编译期间是已知的，并且组合功能是严格的。如果代码不是专用的，或者组合函数不严格，则折叠将产生O(length xs * length ys)大小的代码。通过使用更严格的foldl1'可以缓解该问题。

foldPairProduct' :: Num a => (Maybe a -> Maybe a -> Maybe a) -> [a] -> [a] -> Maybe a
foldPairProduct' _ _ [] = Nothing
foldPairProduct' _ [] _ = Nothing
foldPairProduct' f (x:xs) (y:ys) =
  foldl1 f [Just $ x*y, foldPairProduct' f [x] ys, foldPairProduct' f xs [y],
            foldPairProduct' f xs ys]

遇到严格性不足的问题，此处编译器无法使Just构造函数包装的值严格，因为整体结果可能不需要该值，因此折叠通常会产生O(length xs * length ys)大小的thunk Just-当然，对于某些f，例如const，它会保持良好的性能。如果要使用所有值，要使其成为一个良好的内存公民，必须使用足够严格的组合函数f，并在结果中的Just下也强制使用该值(如果它是Just)；使用foldl1'也有帮助。这样，它就可能具有O(length ys + length xs)空间复杂度(列表xs和ys多次使用，因此可以重复使用)。

foldCrossProduct :: Num a => (a -> a -> a) -> [[a]]  -> a
foldCrossProduct f xss = foldl1 f (crossProduct xss)

crossProduct :: Num a => [[a]] -> [a]
crossProduct [] = []
crossProduct (xs:[]) = xs
crossProduct (xs:xss) = [x * y | x <- xs, y <- crossProduct xss]

尽管GHC几乎不执行CSE(公共(public)子表达式消除)，但列表crossProduct xss将(可能)在此处在不同的x之间共享，从而产生O(N2*...*Nk)空间复杂性。如果列表中元素的顺序无关紧要，请重新排序为

crossProduct (xs:xss) = [x * y | y <- crossProduct xss, x <- xs]

帮助。然后crossProduct xss不必立即在内存中，因此可以递增地生成和使用，只有xs必须被记住，因为它已被多次使用。对于递归调用，必须共享其余列表中的第一个，这样会产生整个O(N1+...+Nk-1)空间复杂性。

关于list - 在Haskell中即时减少 list ，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/14992037/