问题描述
我如何把下面的成Parallel.ForEach?
公共异步无效getThreadContents(字符串[]线程)
{
HttpClient的客户端=新的HttpClient();
名单<字符串>用户名=新的名单,其中,字符串>();
INT I = 0;
的foreach(字符串URL中的线程)
{
我++;
progressLabel.Text =扫描线+ i.ToString()+/+ threads.Count<字符串>();
HTT presponseMessage响应=等待client.GetAsync(URL);
字符串的内容=等待response.Content.ReadAsStringAsync();
字符串的用户;
predicate<字符串>用户predicate;
的foreach(在regex.Matches比赛比赛(内容))
{
用户= match.Groups [1]的ToString();
用户predicate =(字符串x)=> x ==用户;
如果(usernames.Find(用户predicate)!=用户)
{
usernames.Add(match.Groups [1]的ToString());
}
}
progressBar1.PerformStep();
}
}
I $ C $光盘,它的前提是异步和并行处理是一样的,我才意识到事实并非如此。我看了看所有的问题,我能找到的关于这一点,我真的似乎无法找到它会为我举个例子。他们大多缺乏可读性强的变量名。使用单字母变量名不解释它们包含一个可怕的方式来说明一个例子。
我通常都在数组名为线程年至2000年300项(包含URL的论坛线程)和它似乎是并行处理(由于许多HTTP请求)将加快执行)。
我必须删除所有不同步(我没有什么异步在foreach以外,唯一的变量定义)之前,我可以用Parallel.ForEach?我应该如何去这样做?我能做到这一点,而不会阻塞主线程?
我使用.NET 4.5的方式。
异步处理和并行处理有很大的不同。如果你不理解上的差异,我觉得你应该先阅读更多关于它(例如什么是异步的,并行编程之间的关系在C#?)。
现在,你想要做什么,其实并不简单,因为你要异步处理大集合,具有并行性(8)的具体程度。使用同步处理,您可以使用 Parallel.ForEach()
(连同 ParallelOptions
配置并行度)但没有将与工作异步
。
在你的code,这是由你所期望的一切UI线程上执行的事实复杂化。 (虽然理想情况下,你不应该直接从您的计算访问用户界面。相反,你应该使用 IProgress
,这将意味着code不再执行上UI线程。)
也许这样做在.net 4.5的最佳方法是使用TPL数据流。其 ActionBlock
不正是你想要什么,但它可以是相当冗长(因为它更灵活的比你所需要的)。因此,有必要创建一个辅助方法:
公共静态任务AsyncParallelForEach< T>(
IEnumerable的< T>源,Func键< T,任务>体,
INT maxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
的TaskScheduler调度= NULL)
{
VAR的选择=新ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism
};
如果(调度器!= NULL)
options.TaskScheduler =调度;
VAR块=新ActionBlock< T>(身体,期权);
的foreach(源VAR项)
block.Post(项目);
block.Complete();
返回block.Completion;
}
在你的情况,你可以使用这样的:
等待AsyncParallelForEach(
线程,异步URL =>等待DownloadUrl(URL),8,
TaskScheduler.FromCurrentSynchronizationContext());
下面, DownloadUrl()
是异步任务
方法处理一个网址(循环体), 8
是并行度(可能不应该在现实code字面常量)和 FromCurrentSynchronizationContext()
确保在code在UI线程上执行。
How do I turn the following into a Parallel.ForEach?
public async void getThreadContents(String[] threads)
{
HttpClient client = new HttpClient();
List<String> usernames = new List<String>();
int i = 0;
foreach (String url in threads)
{
i++;
progressLabel.Text = "Scanning thread " + i.ToString() + "/" + threads.Count<String>();
HttpResponseMessage response = await client.GetAsync(url);
String content = await response.Content.ReadAsStringAsync();
String user;
Predicate<String> userPredicate;
foreach (Match match in regex.Matches(content))
{
user = match.Groups[1].ToString();
userPredicate = (String x) => x == user;
if (usernames.Find(userPredicate) != user)
{
usernames.Add(match.Groups[1].ToString());
}
}
progressBar1.PerformStep();
}
}
I coded it in the assumption that asynchronous and parallel processing would be the same, and I just realized it isn't. I took a look at all the questions I could find on this, and I really can't seem to find an example that does it for me. Most of them lack readable variable names. Using single-letter variable names which don't explain what they contain is a horrible way to state an example.
I normally have between 300 and 2000 entries in the array named threads (Contains URL's to forum threads) and it would seem that parallel processing (Due to the many HTTP requests) would speed up the execution).
Do I have to remove all the asynchrony (I got nothing async outside the foreach, only variable definitions) before I can use Parallel.ForEach? How should I go about doing this? Can I do this without blocking the main thread?
I am using .NET 4.5 by the way.
Asynchronous processing and parallel processing are quite different. If you don't understand the difference, I think you should first read more about it (for example what is the relation between Asynchronous and parallel programming in c#?).
Now, what you want to do is actually not that simple, because you want to process a big collection asynchronously, with a specific degree of parallelism (8). With synchronous processing, you could use Parallel.ForEach()
(along with ParallelOptions
to configure the degree of parallelism), but there is no simple alternative that would work with async
.
In your code, this is complicated by the fact that you expect everything to execute on the UI thread. (Though ideally, you shouldn't access the UI directly from your computation. Instead, you should use IProgress
, which would mean the code no longer has to execute on the UI thread.)
Probably the best way to do this in .Net 4.5 is to use TPL Dataflow. Its ActionBlock
does exactly what you want, but it can be quite verbose (because it's more flexible than what you need). So it makes sense to create a helper method:
public static Task AsyncParallelForEach<T>(
IEnumerable<T> source, Func<T, Task> body,
int maxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
TaskScheduler scheduler = null)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism
};
if (scheduler != null)
options.TaskScheduler = scheduler;
var block = new ActionBlock<T>(body, options);
foreach (var item in source)
block.Post(item);
block.Complete();
return block.Completion;
}
In your case, you would use it like this:
await AsyncParallelForEach(
threads, async url => await DownloadUrl(url), 8,
TaskScheduler.FromCurrentSynchronizationContext());
Here, DownloadUrl()
is an async Task
method that processes a single URL (the body of your loop), 8
is the degree of parallelism (probably shouldn't be a literal constant in real code) and FromCurrentSynchronizationContext()
makes sure the code executes on the UI thread.
这篇关于运行异步方法8次并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!