问题描述
我编写了一个小的 Haskell 程序来打印当前目录中所有文件的 MD5 校验和(递归搜索).基本上是 md5deep
的 Haskell 版本.一切都很好,除非当前目录有大量文件,在这种情况下,我会收到如下错误:
I've written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep
. All is fine and dandy except if the current directory has a very large number of files, in which case I get an error like:
<program>: <currentFile>: openBinaryFile: resource exhausted (Too many open files)
Haskell 的懒惰似乎导致它不关闭文件,即使在其相应的输出行完成后也是如此.
It seems Haskell's laziness is causing it not to close files, even after its corresponding line of output has been completed.
相关代码如下.感兴趣的函数是getList
.
The relevant code is below. The function of interest is getList
.
import qualified Data.ByteString.Lazy as BS
main :: IO ()
main = putStr . unlines =<< getList "."
getList :: FilePath -> IO [String]
getList p =
let getFileLine path = liftM (c -> (hex $ hash $ BS.unpack c) ++ " " ++ path) (BS.readFile path)
in mapM getFileLine =<< getRecursiveContents p
hex :: [Word8] -> String
hex = concatMap (x -> printf "%0.2x" (toInteger x))
getRecursiveContents :: FilePath -> IO [FilePath]
-- ^ Just gets the paths to all the files in the given directory.
对于我如何解决这个问题有什么想法吗?
Are there any ideas on how I could solve this problem?
整个程序可在此处获得:http://haskell.pastebin.com/PAZm0Dcb
The entire program is available here: http://haskell.pastebin.com/PAZm0Dcb
我有很多不适合 RAM 的文件,所以我不是在寻找一次将整个文件读入内存的解决方案.
I have plenty of files that don't fit into RAM, so I am not looking for a solution that reads the entire file into memory at once.
推荐答案
Lazy IO 很容易出错.
Lazy IO is very bug-prone.
正如大家所建议的,您应该使用严格的 IO.
As dons suggested, you should use strict IO.
您可以使用 Iteratee 等工具来帮助您构建严格的 IO 代码.我最喜欢的这项工作的工具是 monadic 列表.
You can use a tool such as Iteratee to help you structure strict IO code. My favorite tool for this job is monadic lists.
import Control.Monad.ListT (ListT) -- List
import Control.Monad.IO.Class (liftIO) -- transformers
import Data.Binary (encode) -- binary
import Data.Digest.Pure.MD5 -- pureMD5
import Data.List.Class (repeat, takeWhile, foldlL) -- List
import System.IO (IOMode(ReadMode), openFile, hClose)
import qualified Data.ByteString.Lazy as BS
import Prelude hiding (repeat, takeWhile)
hashFile :: FilePath -> IO BS.ByteString
hashFile =
fmap (encode . md5Finalize) . foldlL md5Update md5InitialContext . strictReadFileChunks 1024
strictReadFileChunks :: Int -> FilePath -> ListT IO BS.ByteString
strictReadFileChunks chunkSize filename =
takeWhile (not . BS.null) $ do
handle <- liftIO $ openFile filename ReadMode
repeat () -- this makes the lines below loop
chunk <- liftIO $ BS.hGet handle chunkSize
when (BS.null chunk) . liftIO $ hClose handle
return chunk
我在这里使用了pureMD5"包,因为Crypto"似乎没有提供流式"md5 实现.
I used the "pureMD5" package here because "Crypto" doesn't seem to offer a "streaming" md5 implementation.
Monadic 列表/ListT
来自 hackage 上的List"包(变形金刚和 mtl 的 ListT
坏了,也没有像 takeWhile
)
Monadic lists/ListT
come from the "List" package on hackage (transformers' and mtl's ListT
are broken and also don't come with useful functions like takeWhile
)
这篇关于Haskell 延迟 I/O 和关闭文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!