我编写了一个 Haskell 模块来按广度优先顺序列出目录的所有内容。下面是源代码。

module DirElements (dirElem) where

import System.Directory (getDirectoryContents, doesDirectoryExist)
import System.FilePath ((</>))

dirElem :: FilePath -> IO [[FilePath]]
dirElem dirPath = iterateM (not.null) (concatMapM getDirectoryContents') [dirPath] >>= return.tail

getDirectoryContents' :: FilePath -> IO [FilePath]
getDirectoryContents' dirPath = do
  isDir <- do doesDirectoryExist dirPath
  if isDir then dirContent else return [] where
    dirContent = do
      contents <- getDirectoryContents dirPath
      return.(map (dirPath</>)).tail.tail $ contents

iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
iterateM fb f x = do --Notice: Due to the the implementation of >>=, iterateM can't be writen like iterate which gives a infinite list and have type of iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
  if fb x
    then do
      tail <- do {fx <- f x; iterateM fb f fx}
      return (x:tail)
    else return []

concatMapM :: Monad m => (a -> m[b]) -> [a] -> m[b]
concatMapM f list = mapM f list >>= return.concat

它工作正常,但是在大目录上执行时,它会“暂停”一段时间,然后弹出所有结果。

经过研究,我发现它与 sequence $ map return [1..]::[[Int]] 是同一个问题,请参阅 Why the Haskell sequence function can't be lazy or why recursive monadic functions can't be lazy

最佳答案

我修改了 Davorak 链接到的旧答案以使用新的 pipes 库。

它使用 StateP 来保持一个未遍历目录的队列,以便它可以进行广度优先遍历。为方便起见,它使用 MaybeP 退出循环。

import Control.Monad
import Control.Proxy
import Control.Proxy.Trans.Maybe
import Control.Proxy.Trans.State as S
import Data.Sequence hiding (filter)
import System.FilePath.Posix
import System.Directory

getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
  = fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path

traverseTree
    :: (Proxy p)
    => FilePath
    -> () -> Producer (MaybeP (StateP (Seq FilePath) p)) FilePath IO r
traverseTree path () = do
    liftP $ S.modify (|> path)
    forever $ do
        x <- liftP $ S.gets viewl
        case x of
            EmptyL    -> mzero
            file :< s -> do
                liftP $ S.put s
                respond file
                p <- lift $ doesDirectoryExist file
                when p $ do
                    names <- lift $ getUsefulContents file
                    let namesfull = map (file </>) names
                    liftP $ forM_ namesfull $ \name ->
                        S.modify (|> name)

这定义了广度优先的文件懒惰生产者。如果将其连接到打印阶段,它将在遍历树时打印出文件:
main = runProxy $ evalStateK empty $ runMaybeK $
    traverseTree "/tmp" >-> putStrLnD

懒惰的意思是,如果你只要求3个文件,它只会遍历树所需的次数来生成三个文件,然后它会停止:
    main = runProxy $ evalStateK empty $ runMaybeK $
        traverseTree "/tmp" >-> takeB_ 3 >-> putStrLnD

如果您想了解有关 pipes library 的更多信息,那么我建议您阅读 the tutorial

关于performance - 以广度优先的顺序列出目录的所有内容导致效率低下,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14474545/

10-11 03:28