因此,在编写程序以从存储在单独文件中的多个大整数向量制作协方差矩阵的过程中,我遇到了很多麻烦。我从写作开始

mean xs = realToFrac (sum xs) / realToFrac (length xs)
cov xs ys = mean (zipWith (*) xs ys) - mean xs * mean ys
covmat vectors = [cov xs ys | ys <- vectors, xs <- vectors]

该方法适用于少量输入,但您可以看到即使是“均值”也是多么低效。进行求和时,它将所有xs保留在内存中,因为它们将由“length”使用。这样可以解决:
mean xs = realToFrac thisSum / realToFrac thisLength
    where (thisSum, thisLength) = foldl' (\(s,l) y-> (s+y,l+1)) (0,0) xs

但是后来我遇到了“cov”的同样问题。当我用这种风格重写“cov”时,它最终并没有使用我的“mean”函数。当我编写“covmat”功能时,我还有一个层次需要提高,这将变得极其复杂。

因此,我有两个目标,似乎有冲突:
  • 遍历每个列表一次,而不将其保留在内存中
  • 将“covmat”分解为更简单,有意义的函数,特别是“cov”和“mean”

  • 我没有办法将这两个目标与我对Haskell的了解统一。但是从概念上讲,这似乎很简单:所有这些功能都需要“监听”列表中出现的相同列表的值。Haskell中有没有办法像这样组织它?如果这需要不同的数据结构或其他库,我可以接受。

    最佳答案

    因此,我进行了一些挖掘,并提出了以下建议。

    编辑:Gist对于那些喜欢SO格式的人。

    首先,一些均值的实现

    module Means where
    
    import Data.List
    import Control.Applicative
    
    mean :: (Fractional a1, Real a, Foldable t) => t a -> a1
    mean xs = realToFrac (sum xs) / realToFrac (length xs)
    
    mean' :: (Foldable f, Fractional a) => f a -> a
    mean' = liftA2 (/) sum (fromIntegral . length)
    
    data Pair = Pair {-# UNPACK #-}!Int {-# UNPACK #-}!Double
    
    mean'' :: [Double] -> Double
    mean'' xs = s / fromIntegral n
      where
        Pair n s = foldl' k (Pair 0 0) xs
        k (Pair n s) x = Pair (n+1) (s+x)
    

    最后一个使用显式的严格对构造函数。 IIRC的(,)是惰性的,因此这应该为我们提供更好的性能特征。
    module Covariance where
    
    import Means
    
    covariance :: (Fractional a, Real a1) => [a1] -> [a1] -> a
    covariance xs ys = mean (zipWith (*) xs ys) - mean xs * mean ys
    
    covariance' :: Fractional a => [a] -> [a] -> a
    covariance' xs ys = mean' (zipWith (*) xs ys) - mean' xs * mean' ys
    
    covariance'' :: [Double] -> [Double] -> Double
    covariance'' xs ys = mean'' (zipWith (*) xs ys) - mean'' xs * mean'' ys
    
    covariance''' :: [Double] -> [Double] -> Double
    covariance''' xs ys =
        let mx = mean'' xs
            my = mean'' ys
        in
        sum (zipWith (\x y -> (x - mx) * (y - my)) xs ys) / fromIntegral (length xs)
    

    我使用每种不同的均值选项尝试了cov的几个版本,然后是一个较丑的“性能”版本。

    我把一个简单的Main和一些硬编码的列表放在一起进行测试。
    module Main where
    
    import Means
    import Covariance
    
    v1 = [1000000..2000000]
    
    v2 = [2000000..3000000]
    
    main :: IO ()
    main = do
      -- let cov = covariance v1 v2
      -- let cov = covariance' v1 v2
      -- let cov = covariance'' v1 v2
      let cov = covariance''' v1 v2
      print cov
    

    使用-rtsopts编译并使用+RTS -s运行,我得到了以下分配信息。
    covariance:
    8.33335e10
         531,816,984 bytes allocated in the heap
         890,566,720 bytes copied during GC
         148,609,912 bytes maximum residency (11 sample(s))
          15,649,528 bytes maximum slop
                 374 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       981 colls,     0 par    0.385s   0.389s     0.0004s    0.0012s
      Gen  1        11 colls,     0 par    0.445s   0.584s     0.0531s    0.2084s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.194s  (  0.168s elapsed)
      GC      time    0.830s  (  0.973s elapsed)
      EXIT    time    0.001s  (  0.029s elapsed)
    module Main where
      Total   time    1.027s  (  1.172s elapsed)
    
      %GC     time      80.9%  (83.0% elapsed)
    
      Alloc rate    2,741,140,975 bytes per MUT second
    
      Productivity  19.1% of total user, 16.8% of total elapsed
    
    covariance':
    8.333350000320508e10
         723,822,456 bytes allocated in the heap
         891,626,240 bytes copied during GC
         185,629,664 bytes maximum residency (11 sample(s))
           3,693,296 bytes maximum slop
                 435 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0      1372 colls,     0 par    0.388s   0.392s     0.0003s    0.0010s
      Gen  1        11 colls,     0 par    0.388s   0.551s     0.0501s    0.1961s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.227s  (  0.202s elapsed)
      GC      time    0.777s  (  0.943s elapsed)
      EXIT    time    0.001s  (  0.029s elapsed)
      Total   time    1.006s  (  1.176s elapsed)
    
      %GC     time      77.2%  (80.2% elapsed)
    
      Alloc rate    3,195,430,190 bytes per MUT second
    
      Productivity  22.8% of total user, 19.6% of total elapsed
    
    covariance'':
    8.333350000320508e10
         456,108,392 bytes allocated in the heap
         394,432,096 bytes copied during GC
          79,295,648 bytes maximum residency (15 sample(s))
          21,161,776 bytes maximum slop
                 201 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       861 colls,     0 par    0.085s   0.089s     0.0001s    0.0005s
      Gen  1        15 colls,     0 par    0.196s   0.274s     0.0182s    0.0681s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.124s  (  0.106s elapsed)
      GC      time    0.282s  (  0.362s elapsed)
      EXIT    time    0.001s  (  0.021s elapsed)
      Total   time    0.408s  (  0.491s elapsed)
    
      %GC     time      69.1%  (73.7% elapsed)
    
      Alloc rate    3,681,440,521 bytes per MUT second
    
      Productivity  30.9% of total user, 25.9% of total elapsed
    
    covariance''':
    8.333349999886264e10
         336,108,336 bytes allocated in the heap
         202,943,312 bytes copied during GC
          47,493,864 bytes maximum residency (10 sample(s))
          13,578,520 bytes maximum slop
                 115 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       633 colls,     0 par    0.053s   0.055s     0.0001s    0.0002s
      Gen  1        10 colls,     0 par    0.089s   0.131s     0.0131s    0.0472s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.095s  (  0.086s elapsed)
      GC      time    0.142s  (  0.186s elapsed)
      EXIT    time    0.001s  (  0.011s elapsed)
      Total   time    0.240s  (  0.286s elapsed)
    
      %GC     time      59.2%  (65.1% elapsed)
    
      Alloc rate    3,522,631,228 bytes per MUT second
    
      Productivity  40.8% of total user, 34.1% of total elapsed
    

    如您所见,很多分配取决于我们所采用的均值。通过将mean''与严格对构造器一起使用,即使是天真的zipWith实现,我们也能获得最大的提升。

    我正在使用 weigh 来连接实现,所以我可能会再获得一些数据。

    除了调整组件功能外,我不知道一种处理covmat的性能更高的方法,但是严格的pair构造函数至少应该改善您的空间特性,而不管您做什么。

    编辑:weigh结果
    Case                                  Allocated    GCs
    naive mean                          723,716,168  1,382
    applicative mean                    723,714,736  1,382
    optimized mean, naive zipWith       456,000,688    875
    optimized mean, hand-tuned zipWith  336,000,672    642
    

    第二次编辑:

    我捕获了加百利的超赞foldl,以查看我们无需手动调整显式严格对的意思就能获得什么样的性能。
    import qualified Control.Foldl as L
    
    mean''' :: [Double] -> Double
    mean''' = L.fold (liftA2 (/) L.sum L.genericLength)
    
    covariance'''' :: [Double] -> [Double] -> Double
    covariance'''' xs ys = mean''' (zipWith (*) xs ys) - mean''' xs * mean''' ys
    
    covariance''''' :: [Double] -> [Double] -> Double
    covariance''''' xs ys = let mx = mean''' xs
                                my = mean''' ys
                            in
                            mean''' (zipWith (\x y -> (x - mx) * (y - my)) xs ys)
    

    分配结果:
    covariance'''':
    8.333350000320508e10
         336,108,272 bytes allocated in the heap
         222,635,752 bytes copied during GC
          61,198,528 bytes maximum residency (10 sample(s))
          13,627,544 bytes maximum slop
                 140 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       633 colls,     0 par    0.052s   0.054s     0.0001s    0.0003s
      Gen  1        10 colls,     0 par    0.105s   0.155s     0.0155s    0.0592s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.110s  (  0.099s elapsed)
      GC      time    0.156s  (  0.209s elapsed)
      EXIT    time    0.001s  (  0.014s elapsed)
      Total   time    0.269s  (  0.323s elapsed)
    
      %GC     time      58.1%  (64.5% elapsed)
    
      Alloc rate    3,054,641,122 bytes per MUT second
    
      Productivity  41.8% of total user, 34.9% of total elapsed
    
    covariance''''':
    8.333349999886264e10
         336,108,232 bytes allocated in the heap
         202,942,400 bytes copied during GC
          47,493,816 bytes maximum residency (10 sample(s))
          13,578,568 bytes maximum slop
                 115 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       633 colls,     0 par    0.057s   0.059s     0.0001s    0.0003s
      Gen  1        10 colls,     0 par    0.086s   0.126s     0.0126s    0.0426s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.096s  (  0.087s elapsed)
      GC      time    0.143s  (  0.184s elapsed)
      EXIT    time    0.001s  (  0.011s elapsed)
      Total   time    0.241s  (  0.285s elapsed)
    
      %GC     time      59.2%  (64.7% elapsed)
    
      Alloc rate    3,504,449,342 bytes per MUT second
    
      Productivity  40.8% of total user, 34.5% of total elapsed
    

    weigh结果:
    foldl mean                          336,000,568    642
    foldl mean, tuned zipWith           336,000,568    642
    

    总而言之,看起来foldl实现是最好的选择。它在做什么非常清楚,并提出了一些非常花哨的技巧来有效地输入流,达到或超过我们手动调整的结果。您可能可以使用其他数据结构从中获得更多 yield ,但是对于不起眼的列表来说,这是相当不错的性能。 :D

    第三编辑:

    我以前从未使用过Edward的folds,因此我可能做的非常愚蠢,但是我也尝试了使用这些实现。
    import qualified Data.Fold as F
    
    sumL :: Num a => F.L a a
    sumL = F.L id (+) 0
    
    lengthL :: Num b => F.L a b
    lengthL = F.L id (const . (+1)) 0
    
    mean'''' :: [Double] -> Double
    mean'''' = flip F.run (liftA2 (/) sumL lengthL)
    
    covariance'''''' :: [Double] -> [Double] -> Double
    covariance'''''' xs ys = mean'''' (zipWith (*) xs ys) - mean'''' xs * mean'''' ys
    
    covariance''''''' :: [Double] -> [Double] -> Double
    covariance''''''' xs ys = let mx = mean'''' xs
                                  my = mean'''' ys
                            in
                            mean'''' (zipWith (\x y -> (x - mx) * (y - my)) xs ys)
    

    分配结果:
    covariance'''''':
    8.333350000320508e10
         456,108,488 bytes allocated in the heap
         394,432,096 bytes copied during GC
          79,295,648 bytes maximum residency (15 sample(s))
          21,161,776 bytes maximum slop
                 201 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       861 colls,     0 par    0.089s   0.092s     0.0001s    0.0003s
      Gen  1        15 colls,     0 par    0.198s   0.276s     0.0184s    0.0720s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.135s  (  0.119s elapsed)
      GC      time    0.287s  (  0.367s elapsed)
      EXIT    time    0.001s  (  0.019s elapsed)
      Total   time    0.425s  (  0.506s elapsed)
    
      %GC     time      67.6%  (72.5% elapsed)
    
      Alloc rate    3,388,218,993 bytes per MUT second
    
      Productivity  32.3% of total user, 27.1% of total elapsed
    
    covariance''''''':
    8.333349999886264e10
         456,108,552 bytes allocated in the heap
         291,275,200 bytes copied during GC
          62,670,040 bytes maximum residency (11 sample(s))
          15,029,432 bytes maximum slop
                 172 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       862 colls,     0 par    0.068s   0.070s     0.0001s    0.0003s
      Gen  1        11 colls,     0 par    0.149s   0.210s     0.0191s    0.0570s
    
      INIT    time    0.000s  (  0.002s elapsed)
      MUT     time    0.118s  (  0.104s elapsed)
      GC      time    0.217s  (  0.280s elapsed)
      EXIT    time    0.001s  (  0.016s elapsed)
      Total   time    0.337s  (  0.403s elapsed)
    
      %GC     time      64.3%  (69.6% elapsed)
    
      Alloc rate    3,870,870,585 bytes per MUT second
    
      Productivity  35.7% of total user, 29.9% of total elapsed
    

    weigh结果:
    folds mean                          456,000,784    875
    folds mean, tuned zipWith           456,000,888    871
    

    另一个编辑:我也尝试过使用folds而不是L'L选项,但是结果是相同的。

    09-19 07:27