本文介绍了什么是管道/管道试图解决的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我看到有人推荐各种懒惰的IO相关任务的管道/导管库。这些库完全解决了什么问题? 另外,当我尝试使用一些与hackage相关的库时,很可能有三种不同的版本。例如: attoparsec pipes-attoparsec attoparsec-conduit 这让我感到困惑。对于我的解析任务,我应该使用attoparsec还是pipes-attoparsec / attoparsec-conduit?管道/导管版本与普通香草attoparsec相比有什么好处? 懒惰IO 懒惰IO的工作方式如下所示: readFile :: FilePath - > IO ByteString 其中 ByteString 只能保证逐块读取。为此,我们可以(几乎)写入 - 给定`readChunk`,它读取从n 开始的块readChunk :: FilePath - > Int - > IO(Int,ByteString) readFile fp = readChunks 0其中 readChunks n = do (n',chunk)< - readChunk fp n chunks< ; - readChunks n' return(chunk>> chunk) 请注意,在返回可用的部分结果 chunk 之前执行IO操作 readChunks n'。这意味着我们并不懒惰。为了解决这个问题,我们使用 unsafeInterleaveIO readFile fp = readChunks 0 where readChunks n = do (n',chunk) chunks return(chunk chunk) 导致 readChunks n'立即返回,只有在强制强制执行时才会执行 IO 操作。 这是一个危险的部分:通过使用 unsafeInterleaveIO ,我们推迟了一堆 IO 动作到非确定点未来,这取决于我们如何使用我们的大块 ByteString 。 解决了协程问题 我们想要做的是在调用 readChunk 和递归 readChunks 。 readFileCo :: Monoid a => FilePath - > (ByteString - > IO a) - > IO a readFileCo fp action = readChunks 0其中 readChunks n = do (n',chunk)a 作为< - readChunks n'action return(a<>) 现在我们有机会在每个小块加载后执行任意 IO 操作。这使我们可以在不将 ByteString 完全加载到内存中的情况下逐步完成更多的工作。不幸的是,这并不是很好的组合 - 我们需要构建我们的消费 action ,并按顺序将它传递给我们的 ByteString 生产者它可以运行。 基于管道的IO 这实质上就是管道解决方案 - 它允许我们轻松地编写有效的协同例程。例如,我们现在将我们的文件读取器编写为一个 Producer ,它可以被认为是流式传输文件的块,当它的效果最终运行时。 produceFile :: FilePath - > Producer ByteString IO() produceFile fp = produce 0其中产生n = do (n',chunk)产生块产生n' 注意这段代码和 readFileCo - 我们简单地用 yield chunk 替换coroutine action的调用。迄今为止产生。对产生的调用会构建 Producer 类型而不是原始 IO 为了构建一个名为 Effect IO()> Pipe c>。 所有这些管道构建都是静态地完成的,而不实际调用任何 IO 操作。这就是 pipes 如何让你更容易编写你的协同程序。当我们在 main IO 中调用 runEffect 时, code> action。 runEffect :: Effect IO() - > IO() Attoparsec 你想要将 attoparsec 插入 pipes ?那么, attoparsec 针对惰性解析进行了优化。如果你以有效的方式生成了提供给 attoparsec 解析器的块,那么你将陷入僵局。您可以使用严格的IO并将整个字符串加载到内存中,以便在您的解析器中对其进行延迟使用。 这是简单的,可预测的,但效率低下。 使用懒惰IO,并失去推理生产IO效果何时会实际运行的能力,从而导致可能的资源泄漏或关闭处理异常解析项目的消耗计划。这比(1)更有效,但很容易变得不可预测;或 使用 pipes (或 conduit )建立一个系统这些协程包括您的懒惰 attoparsec 解析器,允许它在尽可能少的输入时运行,而在整个流中尽可能延迟地生成解析值。 b $ b I have seen people recommending pipes/conduit library for various lazy IO related tasks. What problem do these libraries solve exactly?Also, when I try to use some hackage related libraries, it is highly likely there are three different versions. Example:attoparsecpipes-attoparsecattoparsec-conduitThis confuses me. For my parsing tasks should I use attoparsec or pipes-attoparsec/attoparsec-conduit? What benefit do the pipes/conduit version give me as compared to the plain vanilla attoparsec? 解决方案 Lazy IOLazy IO works like thisreadFile :: FilePath -> IO ByteStringwhere ByteString is guaranteed to only be read chunk-by-chunk. To do so we could (almost) write-- given `readChunk` which reads a chunk beginning at nreadChunk :: FilePath -> Int -> IO (Int, ByteString)readFile fp = readChunks 0 where readChunks n = do (n', chunk) <- readChunk fp n chunks <- readChunks n' return (chunk <> chunks)but here we note that the IO action readChunks n' is performed prior to returning even the partial result available as chunk. This means we're not lazy at all. To combat this we use unsafeInterleaveIOreadFile fp = readChunks 0 where readChunks n = do (n', chunk) <- readChunk fp n chunks <- unsafeInterleaveIO (readChunks n') return (chunk <> chunks)which causes readChunks n' to return immediately, thunking an IO action to be performed only when that thunk is forced.That's the dangerous part: by using unsafeInterleaveIO we've delayed a bunch of IO actions to non-deterministic points in the future that depend upon how we consume our chunks of ByteString.Fixing the problem with coroutinesWhat we'd like to do is slide a chunk processing step in between the call to readChunk and the recursion on readChunks.readFileCo :: Monoid a => FilePath -> (ByteString -> IO a) -> IO areadFileCo fp action = readChunks 0 where readChunks n = do (n', chunk) <- readChunk fp n a <- action chunk as <- readChunks n' action return (a <> as)Now we've got the chance to perform arbitrary IO actions after each small chunk is loaded. This lets us do much more work incrementally without completely loading the ByteString into memory. Unfortunately, it's not terrifically compositional--we need to build our consumption action and pass it to our ByteString producer in order for it to run.Pipes-based IOThis is essentially what pipes solves--it allows us to compose effectful co-routines with ease. For instance, we now write our file reader as a Producer which can be thought of as "streaming" the chunks of the file when its effect gets run finally.produceFile :: FilePath -> Producer ByteString IO ()produceFile fp = produce 0 where produce n = do (n', chunk) <- liftIO (readChunk fp n) yield chunk produce n'Note the similarities between this code and readFileCo above—we simply replace the call to the coroutine action with yielding the chunk we've produced so far. This call to yield builds a Producer type instead of a raw IO action which we can compose with other Pipes types in order to build a nice consumption pipeline called an Effect IO ().All of this pipe building gets done statically without actually invoking any of the IO actions. This is how pipes lets you write your coroutines more easily. All of the effects get triggered at once when we call runEffect in our main IO action.runEffect :: Effect IO () -> IO ()AttoparsecSo why would you want to plug attoparsec into pipes? Well, attoparsec is optimized for lazy parsing. If you are producing the chunks fed to an attoparsec parser in an effectful way then you'll be at an impasse. You couldUse strict IO and load the entire string into memory only to consume it lazily with your parser. This is simple, predictable, but inefficient.Use lazy IO and lose the ability to reason about when your production IO effects will actually get run causing possible resource leaks or closed handle exceptions according to the consumption schedule of your parsed items. This is more efficient than (1) but can easily become unpredictable; or,Use pipes (or conduit) to build up a system of coroutines which include your lazy attoparsec parser allowing it to operate on as little input as it needs while producing parsed values as lazily as possible across the entire stream. 这篇关于什么是管道/管道试图解决的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-30 20:05