本文介绍了在Clojure,懒惰seqs总是分块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的印象是懒惰的seqs总是chunked。

I was under the impression that the lazy seqs were always chunked.

=> (take 1 (map #(do (print \.) %) (range)))
(................................0)

正如预期的打印32个点,由 range 分成32个元素块。但是,当代替 range ,我尝试使用我自己的函数 get-rss-feeds ,延迟seq是更长的区块:

As expected 32 dots are printed because the lazy seq returned by range is chunked into 32 element chunks. However, when instead of range I try this with my own function get-rss-feeds, the lazy seq is no longer chunked:

=> (take 1 (map #(do (print \.) %) (get-rss-feeds r)))
(."http://wholehealthsource.blogspot.com/feeds/posts/default")

只打印一个点,所以我想由返回的延迟序列get-rss-feeds 不分块。确实:

Only one dot is printed, so I guess the lazy-seq returned by get-rss-feeds is not chunked. Indeed:

=> (chunked-seq? (seq (range)))
true

=> (chunked-seq? (seq (get-rss-feeds r)))
false

这是 get-rss-feeds 的来源:

(defn get-rss-feeds
  "returns a lazy seq of urls of all feeds; takes an html-resource from the enlive library"
  [hr]
  (map #(:href (:attrs %))
       (filter #(rss-feed? (:type (:attrs %))) (html/select hr [:link])))

所以看起来chunkiness取决于如何生成延迟seq我偷看的函数范围并且有一些提示是以chunky的方式实现的,所以我有点困惑,如何工作,有人可以澄清吗?

So it appears that chunkiness depends on how the lazy seq is produced. I peeked at the source for the function range and there are hints of it being implemented in a "chunky" manner. So I'm a bit confused as to how this works. Can someone please clarify?

这里是为什么我需要知道。

Here's why I need to know.

我必须下面的代码:(get-rss -entry(get-rss-feeds h-res)url)

调用 get-rss-feeds

The call to get-rss-feeds returns a lazy sequence of URLs of feeds that I need to examine.

get-rss-entry的调用查找特定条目(其中:链接字段与get-rss条目的第二个参数匹配)。它检查 get-rss-feeds 返回的延迟序列。评估每个项目需要通过网络的http请求以获取新的rss源。为了尽量减少http请求的数量,一个一个地检查序列,一旦有匹配就停止就很重要。

The call to get-rss-entry looks for a particular entry (whose :link field matches the second argument of get-rss-entry). It examines the lazy sequence returned by get-rss-feeds. Evaluating each item requires an http request across the network to fetch a new rss feed. To minimize the number of http requests it's important to examine the sequence one-by-one and stop as soon as there is a match.

这里是代码:

(defn get-rss-entry
  [feeds url]
  (ffirst (drop-while empty? (map #(entry-with-url % url) feeds))))

entry-with-url 如果没有匹配则返回一个延迟的匹配序列或一个空序列。

entry-with-url returns a lazy sequence of matches or an empty sequence if there is no match.

正常工作(一次评估一个Feed网址)。但我担心某处,不知何故,它会开始以大块的方式行事,它将开始评估32饲料一次。我知道有一种方法,但在这种情况下似乎甚至不需要。

I tested this and it seems to work correctly (evaluating one feed url at a time). But I am worried that somewhere, somehow it will start behaving in a "chunky" way and it will start evaluating 32 feeds at a time. I know there is a way to avoid chunky behavior as discussed here, but it doesn't seem to even be required in this case.

我使用惰性seq非惯用?

Am I using lazy seq non-idiomatically? Would loop/recur be a better option?

推荐答案

根据Chunking的模糊性,如上所述似乎不明智。显式地un chunking在你真的需要它不分块的情况下,也是明智的,因为如果在某些其他点你的代码改变,chunkifying它的东西不会打破的方式。另一方面,如果您需要操作是顺序的,代理是一个伟大的工具,您可以将下载功能发送给代理,那么他们将一次运行一个,而不管您如何评估功能。在某些时候,你可能想要 pmap 你的序列,然后甚至un-chunking将无法工作,虽然使用一个原子将继续正常工作。

Depending on the vagueness of Chunking seems unwise as you mention above. Explicitly "un chunking" in cases where you really need it not to be chunked is also wise because then if at some other point your code changes in a way that chunkifies it things wont break. On another note, if you need actions to be sequential, agents are a great tool you could send the download functions to an agent then they will be run one at a time and only once regardless of how you evaluate the function. At some point you may want to pmap your sequence and then even un-chunking will not work though using an atom will continue to work correctly.

这篇关于在Clojure,懒惰seqs总是分块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 23:34