问题描述
我正在编写一个多线程Web爬虫,该爬虫每秒使用数百个线程执行许多并发的httpwebrequest,该应用程序运行良好,但有时(随机)一个webrequest挂在getResponseStream()上,而完全忽略了 超时(这种情况在我同时执行数百个请求时发生)使爬网过程永无止境,奇怪的是,对于提琴手来说,这永远不会发生,并且应用程序永远不会挂起,这真的很难调试,因为它是随机发生的. >
我尝试设置
保持活动=假
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
但我仍然有奇怪的举止,有什么主意吗?
谢谢
HttpWebRequest代码:
I'm coding a multithreaded web-crawler that performs a lot of concurrent httpwebrequests every second using hundreds of threads, the application works great but sometimes(randomly) one of the webrequests hangs on the getResponseStream() completely ignoring the timeout(this happen when I perform hundreds of requests concurrently) making the crawling process never end, the strange thing is that with fiddler this never happen and the application never hang, it is really hard to debug because it happens randomly.
I've tried to set
Keep-Alive = false
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
but I still get the strange behavior, any ideas?
Thanks
HttpWebRequest code:
public static string RequestHttp(string url, string referer, ref CookieContainer cookieContainer_0, IWebProxy proxy)
{
string str = string.Empty;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent = randomuseragent();
request.ContentType = "application/x-www-form-urlencoded";
request.Accept = "*/*";
request.CookieContainer = cookieContainer_0;
request.Proxy = proxy;
request.Timeout = 15000;
request.Referer = referer;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
List<byte> list = new List<byte>();
byte[] buffer = new byte[0x400];
int count = responseStream.Read(buffer, 0, buffer.Length);
while (count != 0)
{
list.AddRange(buffer.ToList<byte>().GetRange(0, count));
if (list.Count >= 0x100000)
{
break;
}
count = 0;
try
{
HERE IT HANGS SOMETIMES ---> count = responseStream.Read(buffer, 0, buffer.Length);
continue;
}
catch
{
continue;
}
}
int num2 = 0x200 * 0x400;
if (list.Count >= num2)
{
list.RemoveRange((num2 * 3) / 10, list.Count - num2);
}
byte[] bytes = list.ToArray();
str = Encoding.Default.GetString(bytes);
Encoding encoding = Encoding.Default;
if (str.ToLower().IndexOf("charset=") > 0)
{
encoding = GetEncoding(str);
}
else
{
try
{
encoding = Encoding.GetEncoding(response.CharacterSet);
}
catch
{
}
}
str = encoding.GetString(bytes);
}
}
return str.Trim();
}
推荐答案
ReadWriteTimeout 属性用于写入由 GetRequestStream 方法,或者从方法返回的流中读取 GetResponseStream 方法.
The ReadWriteTimeout property is used when writing to the stream returned by theGetRequestStream method or reading from the stream returned by theGetResponseStream method.
在对项目中的流进行写入或读取时,可以设置 ReadWriteTimeout 属性.
You can set ReadWriteTimeout property when writing to or reading from a stream in your project.
这篇关于多线程HttpWebRequest随机挂在responseStream上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!