问题描述
我有一个需要登录的站点,才能让您下载文件。目前我正在使用类登录并执行所需的所有刮削(至少在大多数情况下)。
I have a site that requires login before it lets you download files. Currently I am using the BrowserSession Class to login and do all the scraping required (at least for the most part).
BrowserSession类源文件的底部:
BrowserSession Class source at bottom of post:
下载链接显示在文档节点上。但是我不知道如何添加下载功能到该类,如果我尝试下载它们与一个webclient它失败,我已经不得不大量修改BrowserSession类,(我应该修改它作为部分,但没有' t)所以我真的不想改变使用BrowserSession类。
The download Links show up on the document nodes. But I don't know how to add download functionality to that class, and If I try to download them with a webclient it fails, I already had to heavily modify the BrowserSession class, (I should have Modified it as a Partial but didn't) So I don't really want to change from using the BrowserSession Class.
我相信它使用htmlAgilityPack.HtmlWeb来下载和加载网页。
I believe its using htmlAgilityPack.HtmlWeb to download and load the webpages.
如果没有简单的方法修改BrowserSession,有没有可以使用它的CookieCollection与Webclient?
If there is no easy way to modify the BrowserSession, Is there someway to use it's CookieCollection With Webclient?
PS:I需要登录下载文件,否则链接重定向到登录屏幕。这就是为什么我不能简单地使用WebClient,并且需要修改BrowserSession类才能下载,或者在获取页面之前修改WebClient以使用cookie。
PS: I Need to be logged in to download the file, Otherwise the link redirects to the login screen. Which is why I am unable to simply use WebClient, and either need to modify the BrowserSession class to be able to download, or modify WebClient to use cookies before getting a page.
我会承认我不太了解cookies(我不知道每次使用GET是否使用GET,或者只是在POST上),但是远程BrowserSession已经照顾了所有这些。
I will admit I do not understand cookies very well (I am not sure if they are used every time GET is used, or if its just on POST), but so far BrowserSession has taken care of all that.
PPS:BrowserSession我发布不是我添加的东西,但核心功能是一样的。 / p>
PPS:The BrowserSession I Posted Is not the one that I added stuff too, however the core functions are all the same.
public class BrowserSession
{
private bool _isPost;
private HtmlDocument _htmlDoc;
/// <summary>
/// System.Net.CookieCollection. Provides a collection container for instances of Cookie class
/// </summary>
public CookieCollection Cookies { get; set; }
/// <summary>
/// Provide a key-value-pair collection of form elements
/// </summary>
public FormElementCollection FormElements { get; set; }
/// <summary>
/// Makes a HTTP GET request to the given URL
/// </summary>
public string Get(string url)
{
_isPost = false;
CreateWebRequestObject().Load(url);
return _htmlDoc.DocumentNode.InnerHtml;
}
/// <summary>
/// Makes a HTTP POST request to the given URL
/// </summary>
public string Post(string url)
{
_isPost = true;
CreateWebRequestObject().Load(url, "POST");
return _htmlDoc.DocumentNode.InnerHtml;
}
/// <summary>
/// Creates the HtmlWeb object and initializes all event handlers.
/// </summary>
private HtmlWeb CreateWebRequestObject()
{
HtmlWeb web = new HtmlWeb();
web.UseCookies = true;
web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest);
web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse);
web.PreHandleDocument = new HtmlWeb.PreHandleDocumentHandler(OnPreHandleDocument);
return web;
}
/// <summary>
/// Event handler for HtmlWeb.PreRequestHandler. Occurs before an HTTP request is executed.
/// </summary>
protected bool OnPreRequest(HttpWebRequest request)
{
AddCookiesTo(request); // Add cookies that were saved from previous requests
if (_isPost) AddPostDataTo(request); // We only need to add post data on a POST request
return true;
}
/// <summary>
/// Event handler for HtmlWeb.PostResponseHandler. Occurs after a HTTP response is received
/// </summary>
protected void OnAfterResponse(HttpWebRequest request, HttpWebResponse response)
{
SaveCookiesFrom(response); // Save cookies for subsequent requests
}
/// <summary>
/// Event handler for HtmlWeb.PreHandleDocumentHandler. Occurs before a HTML document is handled
/// </summary>
protected void OnPreHandleDocument(HtmlDocument document)
{
SaveHtmlDocument(document);
}
/// <summary>
/// Assembles the Post data and attaches to the request object
/// </summary>
private void AddPostDataTo(HttpWebRequest request)
{
string payload = FormElements.AssemblePostPayload();
byte[] buff = Encoding.UTF8.GetBytes(payload.ToCharArray());
request.ContentLength = buff.Length;
request.ContentType = "application/x-www-form-urlencoded";
System.IO.Stream reqStream = request.GetRequestStream();
reqStream.Write(buff, 0, buff.Length);
}
/// <summary>
/// Add cookies to the request object
/// </summary>
private void AddCookiesTo(HttpWebRequest request)
{
if (Cookies != null && Cookies.Count > 0)
{
request.CookieContainer.Add(Cookies);
}
}
/// <summary>
/// Saves cookies from the response object to the local CookieCollection object
/// </summary>
private void SaveCookiesFrom(HttpWebResponse response)
{
if (response.Cookies.Count > 0)
{
if (Cookies == null) Cookies = new CookieCollection();
Cookies.Add(response.Cookies);
}
}
/// <summary>
/// Saves the form elements collection by parsing the HTML document
/// </summary>
private void SaveHtmlDocument(HtmlDocument document)
{
_htmlDoc = document;
FormElements = new FormElementCollection(_htmlDoc);
}
}
FormElementCollection类:
FormElementCollection Class:
/// <summary>
/// Represents a combined list and collection of Form Elements.
/// </summary>
public class FormElementCollection : Dictionary<string, string>
{
/// <summary>
/// Constructor. Parses the HtmlDocument to get all form input elements.
/// </summary>
public FormElementCollection(HtmlDocument htmlDoc)
{
var inputs = htmlDoc.DocumentNode.Descendants("input");
foreach (var element in inputs)
{
string name = element.GetAttributeValue("name", "undefined");
string value = element.GetAttributeValue("value", "");
if (!name.Equals("undefined")) Add(name, value);
}
}
/// <summary>
/// Assembles all form elements and values to POST. Also html encodes the values.
/// </summary>
public string AssemblePostPayload()
{
StringBuilder sb = new StringBuilder();
foreach (var element in this)
{
string value = System.Web.HttpUtility.UrlEncode(element.Value);
sb.Append("&" + element.Key + "=" + value);
}
return sb.ToString().Substring(1);
}
}
推荐答案
设法使其工作,使用BrowserSession和修改后的webClient:
I managed to get it working, using BrowserSession, and a modified webClient:
首先关闭将_htmlDoc更改为Public以访问文档节点:
First off Change the _htmlDoc to Public to access the document Nodes:
public class BrowserSession
{
private bool _isPost;
public string previous_Response { get; private set; }
public HtmlDocument _htmlDoc { get; private set; }
}
其次将此方法添加到BrowserSession:
Secondly Add this method to BrowserSession:
public void DownloadCookieProtectedFile(string url, string Filename)
{
using (CookieAwareWebClient wc = new CookieAwareWebClient())
{
wc.Cookies = Cookies;
wc.DownloadFile(url, Filename);
}
}
//rest of BrowserSession
第三次添加这个类似的地方,它允许将cookies从BrowserSession传递给WebClient。
Third Add this Class Somewhere, Which allows passing the cookies from BrowserSession to the WebClient.
public class CookieAwareWebClient : WebClient
{
public CookieCollection Cookies = new CookieCollection();
private void AddCookiesTo(HttpWebRequest request)
{
if (Cookies != null && Cookies.Count > 0)
{
request.CookieContainer.Add(Cookies);
}
}
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
HttpWebRequest webRequest = request as HttpWebRequest;
if (webRequest != null)
{
if (webRequest.CookieContainer == null) webRequest.CookieContainer = new CookieContainer();
AddCookiesTo(webRequest);
}
return request;
}
}
这应该让你有能力使用BrowserSession像你一样通常会,当您需要获取只能访问的文件如果您登录,只需调用BrowserSession.DownloadCookieProtectedFile()就像它是WebClient一样,只设置像这样的Cookie:
This should Give you the ability to use BrowserSession Like you normally would, And when you need to get a file that you can only access If your logged in, Simply Call BrowserSession.DownloadCookieProtectedFile() As if it were a WebClient, Only Set the Cookies like so:
Using(wc = new CookieAwareWebClient())
{
wc.Cookies = BrowserSession.Cookies
//Download with WebClient As normal
wc.DownloadFile();
}
这篇关于有没有使用“BrowserSession”下载文件? C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!