问题描述
我有一个要使用 HttpClient
同时下载的页面的URL列表。 URL列表可能很大(100个或更多!)
I have a list of URLs of pages I want to download concurrently using HttpClient
. The list of URLs can be large (100 or more!)
我目前有以下代码:
var urls = new List<string>
{
@"http:\\www.amazon.com",
@"http:\\www.bing.com",
@"http:\\www.facebook.com",
@"http:\\www.twitter.com",
@"http:\\www.google.com"
};
var client = new HttpClient();
var contents = urls
.ToObservable()
.SelectMany(uri => client.GetStringAsync(new Uri(uri, UriKind.Absolute)));
contents.Subscribe(Console.WriteLine);
问题:由于使用 SelectMany
,几乎同时创建了大量任务。似乎,如果URL列表足够大,则很多Task都会超时(我收到一个任务已取消 例外)。
The problem: due to the usage of SelectMany
, a big bunch of Tasks are created almost at the same time. It seems that if the list of URLs is big enough, a lot Tasks give timeouts (I'm getting "A Task was cancelled" exceptions).
所以,我认为应该有一种方法,也许使用某种调度程序,来限制并发任务的数量,在给定时间不允许超过5个或6个。
So, I thought there should be a way, maybe using some kind of Scheduler, to limit the number of concurrent Tasks, not allowing more than 5 or 6 at a given time.
通过这种方式,我可以像不立即执行任务那样进行并发下载,而不必启动太多可能导致停顿的任务。
This way I could get concurrent downloads without launching too many tasks that may get stall, like they do right now.
该怎么做,这样我才不会被很多超时任务所饱和?
How to do that so I don't saturate with lots of timed-out Tasks?
推荐答案
记住 SelectMany()
实际上是 Select()。Merge()
。尽管 SelectMany
没有 maxConcurrent
参数,但可以。因此,您可以使用它。
Remember SelectMany()
is actually Select().Merge()
. While SelectMany
does not have a maxConcurrent
paramter, Merge()
does. So you can use that.
在您的示例中,您可以这样做:
From your example, you can do this:
var urls = new List<string>
{
@"http:\\www.amazon.com",
@"http:\\www.bing.com",
@"http:\\www.facebook.com",
@"http:\\www.twitter.com",
@"http:\\www.google.com"
};
var client = new HttpClient();
var contents = urls
.ToObservable()
.Select(uri => Observable.FromAsync(() => client.GetStringAsync(uri)))
.Merge(2); // 2 maximum concurrent requests!
contents.Subscribe(Console.WriteLine);
这篇关于使用Rx和SelectMany限制并发请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!