haskell - 为什么我的函数的无点版本使用更多的内存

我正在研究Euler项目问题，最后得到一个Haskell文件，该文件包含一个如下所示的函数:

matches :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
matches f cs = foldr (\(cs', n) a -> fromBool (f cs cs') * n + a) 0

从fromBool导入Foreign.Marshal.Utils只是为了快速将True转换为1和False转换为0。

我试图提高解决方案的速度，所以我尝试从foldr切换到foldl'(在过程中切换参数)，因为我认为foldr在数字上使用没有多大意义。

根据GHC的探查器，从foldr切换到foldl'导致我分配的内存多于两倍。

为了好玩，我还决定将lambda替换为该函数的无点版本:

matches :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
matches f cs = foldr ((+) . uncurry ((*) . fromBool . f cs)) 0

这导致我的内存分配比foldr版本增加了20倍。

现在，这并不是什么大问题，因为即使在20倍的情况下，总内存分配也仅与135Mb有关，并且程序的运行时相对不受影响，如果更高的内存分配版本运行得更快一些。

但是我真的很想知道如何获得这些结果，以便将来在没有太多回旋余地的时候可以选择“正确的”功能。

编辑:

GHC版本7.10.2，使用-O2 -prof -fprof-auto编译。用+RTS -p执行。

编辑2:

好吧，看起来很难复制其余的代码，这太困难了，下面是整个程序:

下方的扰流板:

{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Monad
import Data.List
import Foreign.Marshal.Utils

data Color = Red | Green | Blue deriving (Eq, Enum, Bounded, Show)

colors :: [Color]
colors = [Red ..]

matches :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
matches f x = foldr ((+) . uncurry ((*) . fromBool . f x)) 0
-- matches f x = foldr (\(y, n) a -> fromBool (f x y) * n + a) 0
-- matches f x = foldl' (\a (y, n) -> fromBool (f x y) * n + a) 0

invert :: [([Color], Int)] -> [([Color], Int)]
invert rs = (\cs -> (cs, matches valid cs rs)) <$> choices
  where
    len = maximum $ length . fst <$> rs
    choices = replicateM len colors
    valid (x : xs) (y : ys) = x /= y && valid xs ys
    valid _ _ = True

expand :: [([Color], Int)] -> [([Color], Int)]
expand rs = (\cs -> (cs, matches valid cs rs)) <$> choices
  where
    len = maximum $ length . fst <$> rs
    choices = replicateM (len + 1) colors
    valid (x1 : x2 : xs) (y : ys) = x1 /= y && x2 /= y && valid (x2 : xs) ys
    valid _ _ = True

getRow :: Int -> [([Color], Int)]
getRow 1 = flip (,) 1 . pure <$> colors
getRow n = expand . invert $ getRow (n - 1)

result :: Int -> Int
result n = sum $ snd <$> getRow n

main :: IO ()
main = print $ result 8

最佳答案

注意:这篇文章是用识字的Haskell撰写的。将其复制到文件中，另存为* .lhs，然后在GHC(i)中进行编译/加载。另外，在您编辑代码之前，我就开始编写此答案，但是 class 仍然相同。

TL; DR
Prelude函数uncurry太懒了，而您的模式匹配就足够严格了。

谨慎和免责声明

我们正在进入一个神奇而奇怪的地方。谨防。另外，我的核心能力是基本的。现在我已经失去了所有的信誉，让我们开始吧。

经过测试的代码

为了知道我们在哪里获得了额外的内存需求，拥有两个以上的功能很有用。

> import Control.Monad (forM_)

这是您原始的非指向性变体:

> matches :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matches    f cs = foldr (\(cs', n) a -> fromEnum (f cs cs') * n + a) 0

这是一个仅略有一点点的变体，参数a是eta简化的。

> matchesPF' :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPF' f cs = foldr (\(cs', n) -> (+) (fromEnum (f cs cs') * n)) 0

这是一种手动插入uncurry的变体。

> matchesPFI :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPFI f cs = foldr ((+) . (\(cs', n) -> fromEnum (f cs cs') * n)) 0

这是您的无意义版本。

> matchesPF :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPF  f cs = foldr ((+) . uncurry  ((*) . fromEnum . f cs)) 0

这是使用自定义uncurry的变体，请参见下文。

> matchesPFU :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPFU f cs = foldr ((+) . uncurryI ((*) . fromEnum . f cs)) 0

这是一个使用自定义惰性uncurry的变体，请参见下文。

> matchesPFL :: (a -> a -> Bool) -> a -> [(a, Int)] -> Int
> matchesPFL f cs = foldr ((+) . uncurryL ((*) . fromEnum . f cs)) 0

为了轻松测试功能，我们使用一个列表:

> funcs = [matches, matchesPF', matchesPF, matchesPFL, matchesPFU, matchesPFI]

我们自己写的uncurry:

> uncurryI :: (a -> b -> c) -> (a, b) -> c
> uncurryI f (a,b) = f a b

懒人uncurry:

> uncurryL :: (a -> b -> c) -> (a, b) -> c
> uncurryL f p = f (fst p) (snd p)

惰性变体uncurryL具有与Prelude中的变体相同的语义，例如

uncurry (\_ _ -> 0) undefined == 0 == uncurryL (\_ _ -> 0) undefined

而uncurryI在这对人的脊椎中很严格。

> main = do
>   let f a b = a < b
>   forM_ [1..10] $ \i ->
>     forM_ funcs $ \m ->
>       print $ m f i (zip (cycle [1..10]) [1..i*100000])

列表[1..i*100000]刻意依赖i，因此我们不引入CAF并倾斜我们的分配配置文件。

脱码后的代码

在深入研究概要文件之前，让我们看一下每个函数的简化代码:

==================== Desugar (after optimization) ====================
Result size of Desugar (after optimization)
  = {terms: 221, types: 419, coercions: 0}

uncurryL
uncurryL = \ @ a @ b @ c f p -> f (fst p) (snd p)

uncurryI
uncurryI = \ @ a @ b @ c f ds -> case ds of _ { (a, b) -> f a b }

-- uncurried inlined by hand
matchesPFI =
  \ @ a f cs ->
    foldr
      $fFoldable[]
      (. (+ $fNumInt)
         (\ ds ->
            case ds of _ { (cs', n) ->
            * $fNumInt (fromEnum $fEnumBool (f cs cs')) n
            }))
      (I# 0)

-- lazy uncurry
matchesPFL =
  \ @ a f cs ->
    foldr
      $fFoldable[]
      (. (+ $fNumInt)
         (uncurryL (. (* $fNumInt) (. (fromEnum $fEnumBool) (f cs)))))
      (I# 0)

-- stricter uncurry
matchesPFU =
  \ @ a f cs ->
    foldr
      $fFoldable[]
      (. (+ $fNumInt)
         (uncurryI (. (* $fNumInt) (. (fromEnum $fEnumBool) (f cs)))))
      (I# 0)

-- normal uncurry
matchesPF =
  \ @ a f cs ->
    foldr
      $fFoldable[]
      (. (+ $fNumInt)
         (uncurry (. (* $fNumInt) (. (fromEnum $fEnumBool) (f cs)))))
      (I# 0)

-- eta-reduced a
matchesPF' =
  \ @ a f cs ->
    foldr
      $fFoldable[]
      (\ ds ->
         case ds of _ { (cs', n) ->
         + $fNumInt (* $fNumInt (fromEnum $fEnumBool (f cs cs')) n)
         })
      (I# 0)

-- non-point-free
matches =
  \ @ a f cs ->
    foldr
      $fFoldable[]
      (\ ds a ->
         case ds of _ { (cs', n) ->
         + $fNumInt (* $fNumInt (fromEnum $fEnumBool (f cs cs')) n) a
         })
      (I# 0)

到目前为止，一切似乎都很好。没有什么奇怪的了。类型类函数被其字典变体替换，例如foldr变成文件夹$ fFoldable []`，因为我们在列表中将其称为。

个人资料

   Mon Jul 18 15:47 2016 Time and Allocation Profiling Report  (Final)

       Prof +RTS -s -p -RTS

    total time  =        1.45 secs   (1446 ticks @ 1000 us, 1 processor)
    total alloc = 1,144,197,200 bytes  (excludes profiling overheads)

COST CENTRE  MODULE    %time %alloc

matchesPF'   Main       13.6    0.0
matchesPF    Main       13.3   11.5
main.\.\     Main       11.8   76.9
main.f       Main       10.9    0.0
uncurryL     Main        9.5   11.5
matchesPFU   Main        8.9    0.0
matchesPFI   Main        7.3    0.0
matches      Main        6.9    0.0
matchesPFL   Main        6.3    0.0
uncurryI     Main        5.3    0.0
matchesPF'.\ Main        2.6    0.0
matchesPFI.\ Main        2.0    0.0
matches.\    Main        1.5    0.0


                                                             individual     inherited
COST CENTRE        MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN               MAIN                     44           0    0.0    0.0   100.0  100.0
 main              Main                     89           0    0.0    0.0   100.0  100.0
  main.\           Main                     90          10    0.0    0.0   100.0  100.0
   main.\.\        Main                     92          60   11.8   76.9   100.0  100.0
    funcs          Main                     93           0    0.0    0.0    88.2   23.1
     matchesPFI    Main                    110          10    7.3    0.0    11.7    0.0
      matchesPFI.\ Main                    111     5500000    2.0    0.0     4.4    0.0
       main.f      Main                    112     5500000    2.4    0.0     2.4    0.0
     matchesPFU    Main                    107          10    8.9    0.0    15.3    0.0
      uncurryI     Main                    108     5500000    5.3    0.0     6.4    0.0
       main.f      Main                    109     5500000    1.1    0.0     1.1    0.0
     matchesPFL    Main                    104          10    6.3    0.0    17.7   11.5
      uncurryL     Main                    105     5500000    9.5   11.5    11.4   11.5
       main.f      Main                    106     5500000    1.9    0.0     1.9    0.0
     matchesPF     Main                    102          10   13.3   11.5    15.4   11.5
      main.f       Main                    103     5500000    2.1    0.0     2.1    0.0
     matchesPF'    Main                     99          10   13.6    0.0    17.2    0.0
      matchesPF'.\ Main                    100     5500000    2.6    0.0     3.6    0.0
       main.f      Main                    101     5500000    1.0    0.0     1.0    0.0
     matches       Main                     94          10    6.9    0.0    10.9    0.0
      matches.\    Main                     97     5500000    1.5    0.0     4.0    0.0
       main.f      Main                     98     5500000    2.5    0.0     2.5    0.0
 CAF               Main                     87           0    0.0    0.0     0.0    0.0
  funcs            Main                     91           1    0.0    0.0     0.0    0.0
  main             Main                     88           1    0.0    0.0     0.0    0.0
   main.\          Main                     95           0    0.0    0.0     0.0    0.0
    main.\.\       Main                     96           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Handle.FD         84           0    0.0    0.0     0.0    0.0
 CAF               GHC.Conc.Signal          78           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding          76           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Handle.Text       75           0    0.0    0.0     0.0    0.0
 CAF               GHC.IO.Encoding.Iconv    59           0    0.0    0.0     0.0    0.0

Ignore the main\.\. noise, it's just the list. However, there's one point that one should notice immediately: matchesPF and uncurryL use the same alloc%:

matchesPF    Main       13.3   11.5
uncurryL     Main        9.5   11.5

到达核心

现在是时候检查生成的CORE(ghc -ddump-simpl)。我们会注意到，大多数功能都已转换为工作程序包装器，它们看起来大致相同(-dsuppress-all -dsuppress-uniques):

$wa5
$wa5 =
  \ @ a1 w w1 w2 ->
    letrec {
      $wgo
      $wgo =
        \ w3 ->
          case w3 of _ {
            [] -> 0;
            : y ys ->
              case y of _ { (cs', n) ->
              case $wgo ys of ww { __DEFAULT ->
              case w w1 cs' of _ {
                False -> case n of _ { I# y1 -> ww };
                True -> case n of _ { I# y1 -> +# y1 ww }
              }
              }
              }
          }; } in
    $wgo w2

这是您通常的包装工。 $wgo接受一个列表，检查它是否为空，头部严格(case y of _ { (cs', n) ->…)并在递归结果$wgo ys of ww中懒惰。

所有功能看起来都一样。好吧，除了matchesPF(您的变体)之外的所有东西

-- matchesPF
$wa3 =
  \ @ a1 w w1 w2 ->
    letrec {
      $wgo =
        \ w3 ->
          case w3 of _ {
            [] -> 0;
            : y ys ->
              case $wgo ys of ww { __DEFAULT ->
              case let {
                     x = case y of _ { (x1, ds) -> x1 } } in
                   case w w1 x of _ {
                     False ->
                       case y of _ { (ds, y1) -> case y1 of _ { I# y2 -> main13 } };
                              -- main13 is just #I 0
                     True -> case y of _ { (ds, y1) -> y1 }
                   }
              of _ { I# x ->
              +# x ww
              }
              }
          }; } in
    $wgo w2

和matchesPFL(使用惰性uncurryL的变体)

-- matchesPFL
$wa2
$wa2 =
  \ @ a1 w w1 w2 ->
    letrec {
      $wgo =
        \ w3 ->
          case w3 of _ {
            [] -> 0;
            : y ys ->
              case $wgo ys of ww { __DEFAULT ->
              case snd y of ww1 { I# ww2 ->
              case let {
                     x = fst y } in
                   case w w1 x of _ {
                     False -> main13;
                     True -> ww1
                   }
              of _ { I# x ->
              +# x ww
              }
              }
              }
          }; } in
    $wgo w2

它们实际上是相同的。并且它们都包含 let绑定(bind)。这将造成重击，并通常导致更差的空间需求。

解决方案

我认为罪魁祸首很明显。它是uncurry。 GHC希望强制使用正确的语义

uncurry (const (const 0)) undefined

但是，这增加了懒惰和额外的重击。您的非pointfree变体不会引入该行为，因为您在该对上进行模式匹配:

foldr (\(cs', n) a -> …)

还是不相信我？使用惰性模式匹配

foldr (\ ~(cs', n) a -> …)

并且您会注意到matches的行为与matchesPF相同。因此，请使用uncurry稍微严格一些的变体。 uncurryI足以给严格性分析器一个提示。

请注意，成对的行为是众所周知的。 RWH spents a whole chapter trying to optimize the behaviour of a single function，中间对导致问题。

关于haskell - 为什么我的函数的无点版本使用更多的内存，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/38428512/