多个并行执行WebClient作为任务

多个并行执行WebClient作为任务

本文介绍了多个并行执行WebClient作为任务(TPL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在测试并行执行 IWebDriver vs WebClient
(如果有性能差异,多大)



在我设法这样做之前,我有简单的WebClient并行调用问题。 >

似乎还没有执行,我在特定行上的 AgilityPacDocExtraction
上放了一个刹车点 WebClient.DownloadString(URL)



但程序退出
而不是调试 Step Into 可能会显示yeald字符串。



该计划是为了采取所有需要采取的单一方法,
通过每个动作的模式选择器,
然后使用一个简单的 foreach ,它将遍历所有可用的枚举值 - 模式



主要版本:

  static void Main(string [] args)
{
EnumForEach< Action>(Execute);
Task.WaitAll();
}
public static void EnumForEach< Mode>(Action< Mode> Exec)
{

foreach(Enum.GetValues(typeof(Mode))中的模式模式) )
{
Mode Curr = mode;

Task.Factory.StartNew(()=> Exec(Curr));
}

}

模式/动作选择器

 枚举动作
{
Act1,Act2
}

实际执行

  static BrowsresFactory.IeEngine IeNgn = new BrowsresFactory.IeEngin(); 
static string
FlNm = Environment.CurrentDirectory,
URL =,
TmpHtm =;


static void Execute(Action Exc)
{


switch(Exc)
{
case Action.Act1:
break;

case Action.Act2:
URL =UrlofUrChoise here ...;
FlNm + =\\TempHtm.htm;
TmpHtm = IeNgn.AgilityPacDocExtraction(URL).GetElementbyId(Dv_Main)。InnerHtml;
File.WriteAllText(FlNm,TmpHtm);
break;

}
}

持有 WebClient IWebDriver (由硒)不包括在这里,所以这不会占用更多的空间在这个职位,现在不再相关了。

  class BrowsresFactory 
{
public class IeEngine
{

private WebClient WC = new WebClient();
private string tmpExtractedPageValue =;
private HtmlAgilityPack.HtmlDocument retAglPacHtmDoc = new HtmlAgilityPack.HtmlDocument();

public HtmlAgilityPack.HtmlDocument AgilityPacDocExtraction(string URL)
{
WC.Encoding = Encoding.GetEncoding(UTF-8);
tmpExtractedPageValue = WC.DownloadString(URL); //< ---尝试破解
retAglPacHtmDoc.LoadHtml(tmpExtractedPageValue);
return retAglPacHtmDoc;
}
}
}

问题是我不能通过从WebClient提取的值来查看文件中应该被改变的任何内容,加上在调试模式下,我无法进入上述代码中注释的行。我在做什么错在这里?

解决方案

我已经设法解决这个问题,使用 WebClient 我认为需要比 WebDriver 更少的资源,如果真的这也意味着花费更少的时间。



这是代码:

  public void StartEngins()
{
const string URL_Dollar =URL_Dollar;
const string URL_UpdateUsersTimeOut =URL_UpdateUsersTimeOut;


var urlList = new Dictionary< string,string>();
urlList.Add(URL_Dollar,http://bing.com);
urlList.Add(URL_UpdateUsersTimeOut,http:// localhost:.... / ....... aspx);


var htmlDictionary = new ConcurrentDictionary< string,string>();
Parallel.ForEach(
urlList.Values,
new ParallelOptions {MaxDegreeOfParallelism = 20},
url =>下载(url,htmlDictionary)
);
foreach(var pair in htmlDictionary)
{
/// Process(pair);
MessageBox.Show(pair.Value);
}
}

public class SmartWebClient:WebClient
{
private readonly int maxConcurentConnectionCount;

public SmartWebClient(int maxConcurentConnectionCount = 20)
{

this.maxConcurentConnectionCount = maxConcurentConnectionCount;
}

protected override WebRequest GetWebRequest(Uri address)
{
var httpWebRequest =(HttpWebRequest)base.GetWebRequest(address);
if(httpWebRequest == null)
{
return null;
}

if(maxConcurentConnectionCount!= 0)
{
httpWebRequest.ServicePoint.ConnectionLimit = maxConcurentConnectionCount;
}

返回httpWebRequest;
}

}


i am testing parallel execution of IWebDriver vs WebClient .(if there's performance differance and how big it is)

before i managed to do so , i had problem with simple WebClient- Parallel invocation .

seems that it has not been executed, i did put a brake point on the AgilityPacDocExtractionat the specific line of WebClient.DownloadString(URL)

but the program exitsinstead of debug Step Into could show yeald string .

the plan was to have single method for all actions needed to be taken,via a "mode" selector for each action,then using a simple foreach that will iterate on all available Enum values - modes

the main exeutions :

   static void Main(string[] args)
   {
        EnumForEach<Action>(Execute);
        Task.WaitAll();
   }
   public static void EnumForEach<Mode>(Action<Mode> Exec)
   {

            foreach (Mode mode in Enum.GetValues(typeof(Mode)))
            {
                Mode Curr = mode;

                Task.Factory.StartNew(() => Exec(Curr) );
            }

   }

mode / Action selector

    enum Action
    {
        Act1, Act2
    }

the actual execution

    static  BrowsresFactory.IeEngine IeNgn = new BrowsresFactory.IeEngin();
    static string
        FlNm = Environment.CurrentDirectory,
        URL = "",
        TmpHtm ="";


   static void Execute(Action Exc)
   {


        switch (Exc)
        {
            case Action.Act1:
                break;

            case Action.Act2:
                URL  = "UrlofUrChoise here...";
                FlNm += "\\TempHtm.htm";
                TmpHtm = IeNgn.AgilityPacDocExtraction(URL).GetElementbyId("Dv_Main").InnerHtml;
                File.WriteAllText(FlNm, TmpHtm);
                break;

        }
     }

class that hold WebClient and IWebDriver (by selenium) not included here so it will not take some more room in this post and allso not relevent for now.

class BrowsresFactory
{
    public class IeEngine
{

    private WebClient WC = new WebClient();
    private string tmpExtractedPageValue = "";
    private HtmlAgilityPack.HtmlDocument retAglPacHtmDoc = new HtmlAgilityPack.HtmlDocument();

    public HtmlAgilityPack.HtmlDocument AgilityPacDocExtraction(string URL)
    {
                WC.Encoding = Encoding.GetEncoding("UTF-8");
                tmpExtractedPageValue = WC.DownloadString(URL); //<--- tried to break here
                retAglPacHtmDoc.LoadHtml(tmpExtractedPageValue);
                return retAglPacHtmDoc;
    }
}
}

the problem is that i cant see any content in the file that was supposed to be alterd via value extracted from the WebClient , plus when in debug mode i couldn't step into the line commented in above code. what am i doing Wrong here ?

解决方案

I have managed to solve the issue by making a use of WebClient which I think requires less resources than WebDriver and if thats true it also means that takes less time.

This is the code :

public void StartEngins()
{
    const string URL_Dollar = "URL_Dollar";
    const string URL_UpdateUsersTimeOut = "URL_UpdateUsersTimeOut";


    var urlList = new Dictionary<string, string>();
    urlList.Add(URL_Dollar, "http://bing.com");
    urlList.Add(URL_UpdateUsersTimeOut, "http://localhost:..../.......aspx");


    var htmlDictionary = new ConcurrentDictionary<string, string>();
    Parallel.ForEach(
                    urlList.Values,
                    new ParallelOptions { MaxDegreeOfParallelism = 20 },
                    url => Download(url, htmlDictionary)
                    );
    foreach (var pair in htmlDictionary)
    {
        ///Process(pair);
        MessageBox.Show(pair.Value);
    }
}

public class SmartWebClient : WebClient
{
    private readonly int maxConcurentConnectionCount;

    public SmartWebClient(int maxConcurentConnectionCount = 20)
    {

        this.maxConcurentConnectionCount = maxConcurentConnectionCount;
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
        var httpWebRequest = (HttpWebRequest)base.GetWebRequest(address);
        if (httpWebRequest == null)
        {
            return null;
        }

        if (maxConcurentConnectionCount != 0)
        {
            httpWebRequest.ServicePoint.ConnectionLimit = maxConcurentConnectionCount;
        }

        return httpWebRequest;
    }

}

这篇关于多个并行执行WebClient作为任务(TPL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-19 23:35