我编写了一个 Haskell 模块来按广度优先顺序列出目录的所有内容。下面是源代码。
module DirElements (dirElem) where
import System.Directory (getDirectoryContents, doesDirectoryExist)
import System.FilePath ((</>))
dirElem :: FilePath -> IO [[FilePath]]
dirElem dirPath = iterateM (not.null) (concatMapM getDirectoryContents') [dirPath] >>= return.tail
getDirectoryContents' :: FilePath -> IO [FilePath]
getDirectoryContents' dirPath = do
isDir <- do doesDirectoryExist dirPath
if isDir then dirContent else return [] where
dirContent = do
contents <- getDirectoryContents dirPath
return.(map (dirPath</>)).tail.tail $ contents
iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
iterateM fb f x = do --Notice: Due to the the implementation of >>=, iterateM can't be writen like iterate which gives a infinite list and have type of iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
if fb x
then do
tail <- do {fx <- f x; iterateM fb f fx}
return (x:tail)
else return []
concatMapM :: Monad m => (a -> m[b]) -> [a] -> m[b]
concatMapM f list = mapM f list >>= return.concat
它工作正常,但是在大目录上执行时,它会“暂停”一段时间,然后弹出所有结果。
经过研究,我发现它与
sequence $ map return [1..]::[[Int]]
是同一个问题,请参阅 Why the Haskell sequence function can't be lazy or why recursive monadic functions can't be lazy 最佳答案
我修改了 Davorak 链接到的旧答案以使用新的 pipes
库。
它使用 StateP
来保持一个未遍历目录的队列,以便它可以进行广度优先遍历。为方便起见,它使用 MaybeP
退出循环。
import Control.Monad
import Control.Proxy
import Control.Proxy.Trans.Maybe
import Control.Proxy.Trans.State as S
import Data.Sequence hiding (filter)
import System.FilePath.Posix
import System.Directory
getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
= fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path
traverseTree
:: (Proxy p)
=> FilePath
-> () -> Producer (MaybeP (StateP (Seq FilePath) p)) FilePath IO r
traverseTree path () = do
liftP $ S.modify (|> path)
forever $ do
x <- liftP $ S.gets viewl
case x of
EmptyL -> mzero
file :< s -> do
liftP $ S.put s
respond file
p <- lift $ doesDirectoryExist file
when p $ do
names <- lift $ getUsefulContents file
let namesfull = map (file </>) names
liftP $ forM_ namesfull $ \name ->
S.modify (|> name)
这定义了广度优先的文件懒惰生产者。如果将其连接到打印阶段,它将在遍历树时打印出文件:
main = runProxy $ evalStateK empty $ runMaybeK $
traverseTree "/tmp" >-> putStrLnD
懒惰的意思是,如果你只要求3个文件,它只会遍历树所需的次数来生成三个文件,然后它会停止:
main = runProxy $ evalStateK empty $ runMaybeK $
traverseTree "/tmp" >-> takeB_ 3 >-> putStrLnD
如果您想了解有关
pipes
library 的更多信息,那么我建议您阅读 the tutorial 。关于performance - 以广度优先的顺序列出目录的所有内容导致效率低下,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14474545/