本文介绍了Haskell 延迟 I/O 和关闭文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个小的 Haskell 程序来打印当前目录中所有文件的 MD5 校验和(递归搜索).基本上是 md5deep 的 Haskell 版本.一切都很好,除非当前目录有大量文件,在这种情况下,我会收到如下错误:

I've written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep. All is fine and dandy except if the current directory has a very large number of files, in which case I get an error like:

<program>: <currentFile>: openBinaryFile: resource exhausted (Too many open files)

Haskell 的懒惰似乎导致它不关闭文件,即使在其相应的输出行完成后也是如此.

It seems Haskell's laziness is causing it not to close files, even after its corresponding line of output has been completed.

相关代码如下.感兴趣的函数是getList.

The relevant code is below. The function of interest is getList.

import qualified Data.ByteString.Lazy as BS

main :: IO ()
main = putStr . unlines =<< getList "."

getList :: FilePath -> IO [String]
getList p =
    let getFileLine path = liftM (c -> (hex $ hash $ BS.unpack c) ++ " " ++ path) (BS.readFile path)
    in mapM getFileLine =<< getRecursiveContents p

hex :: [Word8] -> String
hex = concatMap (x -> printf "%0.2x" (toInteger x))

getRecursiveContents :: FilePath -> IO [FilePath]
-- ^ Just gets the paths to all the files in the given directory.

对于我如何解决这个问题有什么想法吗?

Are there any ideas on how I could solve this problem?

整个程序可在此处获得:http://haskell.pastebin.com/PAZm0Dcb

The entire program is available here: http://haskell.pastebin.com/PAZm0Dcb

我有很多不适合 RAM 的文件,所以我不是在寻找一次将整个文件读入内存的解决方案.

I have plenty of files that don't fit into RAM, so I am not looking for a solution that reads the entire file into memory at once.

推荐答案

Lazy IO 很容易出错.

Lazy IO is very bug-prone.

正如大家所建议的,您应该使用严格的 IO.

As dons suggested, you should use strict IO.

您可以使用 Iteratee 等工具来帮助您构建严格的 IO 代码.我最喜欢的这项工作的工具是 monadic 列表.

You can use a tool such as Iteratee to help you structure strict IO code. My favorite tool for this job is monadic lists.

import Control.Monad.ListT (ListT) -- List
import Control.Monad.IO.Class (liftIO) -- transformers
import Data.Binary (encode) -- binary
import Data.Digest.Pure.MD5 -- pureMD5
import Data.List.Class (repeat, takeWhile, foldlL) -- List
import System.IO (IOMode(ReadMode), openFile, hClose)
import qualified Data.ByteString.Lazy as BS
import Prelude hiding (repeat, takeWhile)

hashFile :: FilePath -> IO BS.ByteString
hashFile =
    fmap (encode . md5Finalize) . foldlL md5Update md5InitialContext . strictReadFileChunks 1024

strictReadFileChunks :: Int -> FilePath -> ListT IO BS.ByteString
strictReadFileChunks chunkSize filename =
    takeWhile (not . BS.null) $ do
        handle <- liftIO $ openFile filename ReadMode
        repeat () -- this makes the lines below loop
        chunk <- liftIO $ BS.hGet handle chunkSize
        when (BS.null chunk) . liftIO $ hClose handle
        return chunk

我在这里使用了pureMD5"包,因为Crypto"似乎没有提供流式"md5 实现.

I used the "pureMD5" package here because "Crypto" doesn't seem to offer a "streaming" md5 implementation.

Monadic 列表/ListT 来自 hackage 上的List"包(变形金刚和 mtl 的 ListT 坏了,也没有像 takeWhile)

Monadic lists/ListT come from the "List" package on hackage (transformers' and mtl's ListT are broken and also don't come with useful functions like takeWhile)

这篇关于Haskell 延迟 I/O 和关闭文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 23:06