问题描述
我工作的一个链接检查,一般我可以执行 HEAD
的要求,但有些网站似乎禁用这个动词,所以失败我也需要执行 GET
请求(仔细检查该链接是真的死了)
I am working on a link checker, in general I can perform HEAD
requests, however some sites seem to disable this verb, so on failure I need to also perform a GET
request (to double check the link is really dead)
我用下面的code作为我的链接测试:
I use the following code as my link tester:
public class ValidateResult
{
public HttpStatusCode? StatusCode { get; set; }
public Uri RedirectResult { get; set; }
public WebExceptionStatus? WebExceptionStatus { get; set; }
}
public ValidateResult Validate(Uri uri, bool useHeadMethod = true,
bool enableKeepAlive = false, int timeoutSeconds = 30)
{
ValidateResult result = new ValidateResult();
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
if (useHeadMethod)
{
request.Method = "HEAD";
}
else
{
request.Method = "GET";
}
// always compress, if you get back a 404 from a HEAD it can be quite big.
request.AutomaticDecompression = DecompressionMethods.GZip;
request.AllowAutoRedirect = false;
request.UserAgent = UserAgentString;
request.Timeout = timeoutSeconds * 1000;
request.KeepAlive = enableKeepAlive;
HttpWebResponse response = null;
try
{
response = request.GetResponse() as HttpWebResponse;
result.StatusCode = response.StatusCode;
if (response.StatusCode == HttpStatusCode.Redirect ||
response.StatusCode == HttpStatusCode.MovedPermanently ||
response.StatusCode == HttpStatusCode.SeeOther)
{
try
{
Uri targetUri = new Uri(Uri, response.Headers["Location"]);
var scheme = targetUri.Scheme.ToLower();
if (scheme == "http" || scheme == "https")
{
result.RedirectResult = targetUri;
}
else
{
// this little gem was born out of http://tinyurl.com/18r
// redirecting to about:blank
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = null;
}
}
catch (UriFormatException)
{
// another gem... people sometimes redirect to http://nonsense:port/yay
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = WebExceptionStatus.NameResolutionFailure;
}
}
}
catch (WebException ex)
{
result.WebExceptionStatus = ex.Status;
response = ex.Response as HttpWebResponse;
if (response != null)
{
result.StatusCode = response.StatusCode;
}
}
finally
{
if (response != null)
{
response.Close();
}
}
return result;
}
这一切工作正常,很正常。除了当我执行 GET
的要求,整个有效载荷被下载的(我在Wireshark中看到这一点)。
This all works fine and dandy. Except that when I perform a GET
request, the entire payload gets downloaded (I watched this in wireshark).
有什么办法来配置基本的ServicePoint
或的HttpWebRequest
没有缓冲或急于负载响应体呢?
Is there any way to configure the underlying ServicePoint
or the HttpWebRequest
not to buffer or eager load the response body at all?
(如果我是手动编码这一点,我将设置TCP接收窗口非常低,然后只抢到足够的数据包,以获得接头,停止ACKING TCP数据包,只要我有足够的信息。)
(If I were hand coding this I would set the TCP receive window really low, and then only grab enough packets to get the Headers, stop acking TCP packets as soon as I have enough info.)
对于那些想知道这是什么意思来实现,我不想下载一个40K的404,当我得到一个404,这样做了几十万次是昂贵的网络上
推荐答案
当你做一个GET,服务器将开始从文件末尾开始发送数据。除非你打断它。当然,在10 MB /秒,这将是每秒兆字节,所以如果该文件是小,你会得到整个事情。你可以你实际上下载的几种方法量减少。
When you do a GET, the server will start sending data from the start of the file to the end. Unless you interrupt it. Granted, at 10 Mb/sec, that's going to be a megabyte per second so if the file is small you'll get the whole thing. You can minimize the amount you actually download in a couple of ways.
首先,你可以得到响应后收到主叫 response.close
致电 request.Abort
。这将确保底层code不尝试关闭响应之前下载整个事情。这是否有助于上的小文件,我不知道。我不知道,这将$ P $挂起当它试图下载一个多GB的文件pvent您的应用程序。
First, you can call request.Abort
after getting the response and before calling response.close
. That will ensure that the underlying code doesn't try to download the whole thing before closing the response. Whether this helps on small files, I don't know. I do know that it will prevent your application from hanging when it's trying to download a multi-gigabyte file.
其他的事情你能做的就是要求一个范围,而不是整个文件。请参阅的AddRange 方法及其重载。你可以,例如,写 request.AddRange(512)
,这将只下载的前512个字节的文件。这取决于,当然,支持范围查询服务器上。大多数做的。不过,大多数支持HEAD请求了。
The other thing you can do is request a range, rather than the entire file. See the AddRange method and its overloads. You could, for example, write request.AddRange(512)
, which would download only the first 512 bytes of the file. This depends, of course, on the server supporting range queries. Most do. But then, most support HEAD requests, too.
您可能会最终不得不编写一个试图在序列事情的方法:
You'll probably end up having to write a method that tries things in sequence:
- 尝试做一个HEAD请求。如果这样的作品(即不返回500),那么你就大功告成了
- 尝试得到一个范围查询。如果不返回500,那么你就大功告成了。
- 请定期获取,使用
request.Abort
在的GetResponse
的回报。
- try to do a HEAD request. If that works (i.e. doesn't return a 500), then you're done
- try GET with a range query. If that doesn't return a 500, then you're done.
- do a regular GET, with a
request.Abort
afterGetResponse
returns.
这篇关于我该如何执行,而无需下载内容的GET请求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!