本文介绍了无论如何都可以使用&"BrowserSession&"下载文件?C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要登录才能允许您下载文件的站点.目前,我正在使用 BrowserSession 类登录并进行所需的所有抓取操作(至少在大多数情况下).

I have a site that requires login before it lets you download files. Currently I am using the BrowserSession Class to login and do all the scraping required (at least for the most part).

BrowserSession类源位于帖子底部:

BrowserSession Class source at bottom of post:

下载链接显示在文档节点上.但是我不知道如何向该类添加下载功能,如果我尝试通过网络客户端下载它们,将会失败,我已经不得不对BrowserSession类进行大量修改,(我应该将其修改为Partial,但是没有这样做).t)因此,我真的不想更改使用BrowserSession类的方法.

The download Links show up on the document nodes. But I don't know how to add download functionality to that class, and If I try to download them with a webclient it fails, I already had to heavily modify the BrowserSession class, (I should have Modified it as a Partial but didn't) So I don't really want to change from using the BrowserSession Class.

我相信它使用htmlAgilityPack.HtmlWeb来下载和加载网页.

I believe its using htmlAgilityPack.HtmlWeb to download and load the webpages.

如果没有简便的方法来修改BrowserSession,是否可以将它的CookieCollection与Webclient一起使用?

If there is no easy way to modify the BrowserSession, Is there someway to use it's CookieCollection With Webclient?

PS:我需要登录才能下载文件,否则链接将重定向到登录屏幕.这就是为什么我不能简单地使用WebClient,或者需要修改BrowserSession类以使其能够下载,或者需要修改WebClient以使用cookie才能获取页面的原因.

PS: I Need to be logged in to download the file, Otherwise the link redirects to the login screen. Which is why I am unable to simply use WebClient, and either need to modify the BrowserSession class to be able to download, or modify WebClient to use cookies before getting a page.

我承认我不太了解Cookie(我不确定每次使用GET时是否都使用它们,或者不确定是否仅在POST上使用它们),但是到目前为止,BrowserSession已经解决了所有这些问题.

I will admit I do not understand cookies very well (I am not sure if they are used every time GET is used, or if its just on POST), but so far BrowserSession has taken care of all that.

PPS:我发布的BrowserSession也不是我添加的内容,但是核心功能都是相同的.

PPS:The BrowserSession I Posted Is not the one that I added stuff too, however the core functions are all the same.

public class BrowserSession
{
private bool _isPost;
private HtmlDocument _htmlDoc;

/// <summary>
/// System.Net.CookieCollection. Provides a collection container for instances of Cookie class 
/// </summary>
public CookieCollection Cookies { get; set; }

/// <summary>
/// Provide a key-value-pair collection of form elements 
/// </summary>
public FormElementCollection FormElements { get; set; }

/// <summary>
/// Makes a HTTP GET request to the given URL
/// </summary>
public string Get(string url)
{
    _isPost = false;
    CreateWebRequestObject().Load(url);
    return _htmlDoc.DocumentNode.InnerHtml;
}

/// <summary>
/// Makes a HTTP POST request to the given URL
/// </summary>
public string Post(string url)
{
    _isPost = true;
    CreateWebRequestObject().Load(url, "POST");
    return _htmlDoc.DocumentNode.InnerHtml;
}

/// <summary>
/// Creates the HtmlWeb object and initializes all event handlers. 
/// </summary>
private HtmlWeb CreateWebRequestObject()
{
    HtmlWeb web = new HtmlWeb();
    web.UseCookies = true;
    web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest);
    web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse);
    web.PreHandleDocument = new HtmlWeb.PreHandleDocumentHandler(OnPreHandleDocument);
    return web;
}

/// <summary>
/// Event handler for HtmlWeb.PreRequestHandler. Occurs before an HTTP request is executed.
/// </summary>
protected bool OnPreRequest(HttpWebRequest request)
{
    AddCookiesTo(request);               // Add cookies that were saved from previous requests
    if (_isPost) AddPostDataTo(request); // We only need to add post data on a POST request
    return true;
}

/// <summary>
/// Event handler for HtmlWeb.PostResponseHandler. Occurs after a HTTP response is received
/// </summary>
protected void OnAfterResponse(HttpWebRequest request, HttpWebResponse response)
{
    SaveCookiesFrom(response); // Save cookies for subsequent requests
}

/// <summary>
/// Event handler for HtmlWeb.PreHandleDocumentHandler. Occurs before a HTML document is handled
/// </summary>
protected void OnPreHandleDocument(HtmlDocument document)
{
    SaveHtmlDocument(document);
}

/// <summary>
/// Assembles the Post data and attaches to the request object
/// </summary>
private void AddPostDataTo(HttpWebRequest request)
{
    string payload = FormElements.AssemblePostPayload();
    byte[] buff = Encoding.UTF8.GetBytes(payload.ToCharArray());
    request.ContentLength = buff.Length;
    request.ContentType = "application/x-www-form-urlencoded";
    System.IO.Stream reqStream = request.GetRequestStream();
    reqStream.Write(buff, 0, buff.Length);
}

/// <summary>
/// Add cookies to the request object
/// </summary>
private void AddCookiesTo(HttpWebRequest request)
{
    if (Cookies != null && Cookies.Count > 0)
    {
        request.CookieContainer.Add(Cookies);
    }
}

/// <summary>
/// Saves cookies from the response object to the local CookieCollection object
/// </summary>
private void SaveCookiesFrom(HttpWebResponse response)
{
    if (response.Cookies.Count > 0)
    {
        if (Cookies == null)  Cookies = new CookieCollection(); 
        Cookies.Add(response.Cookies);
    }
}

/// <summary>
/// Saves the form elements collection by parsing the HTML document
/// </summary>
private void SaveHtmlDocument(HtmlDocument document)
{
    _htmlDoc = document;
    FormElements = new FormElementCollection(_htmlDoc);
}
}

FormElementCollection类:

FormElementCollection Class:

/// <summary>
/// Represents a combined list and collection of Form Elements.
/// </summary>
public class FormElementCollection : Dictionary<string, string>
{
/// <summary>
/// Constructor. Parses the HtmlDocument to get all form input elements. 
/// </summary>
public FormElementCollection(HtmlDocument htmlDoc)
{
    var inputs = htmlDoc.DocumentNode.Descendants("input");
    foreach (var element in inputs)
    {
        string name = element.GetAttributeValue("name", "undefined");
        string value = element.GetAttributeValue("value", "");
        if (!name.Equals("undefined")) Add(name, value);
    }
}

/// <summary>
/// Assembles all form elements and values to POST. Also html encodes the values.  
/// </summary>
public string AssemblePostPayload()
{
    StringBuilder sb = new StringBuilder();
    foreach (var element in this)
    {
        string value = System.Web.HttpUtility.UrlEncode(element.Value);
        sb.Append("&" + element.Key + "=" + value);
    }
    return sb.ToString().Substring(1);
}
}

推荐答案

我设法使用BrowserSession和经过修改的webClient使其正常工作:

I managed to get it working, using BrowserSession, and a modified webClient:

首先,将_htmlDoc更改为Public以访问文档节点:

First off Change the _htmlDoc to Public to access the document Nodes:

public class BrowserSession
{
    private bool _isPost;
    public string previous_Response { get; private set; }
    public HtmlDocument _htmlDoc { get; private set; }
}

第二次将此方法添加到BrowserSession:

Secondly Add this method to BrowserSession:

 public void DownloadCookieProtectedFile(string url, string Filename)
    {
        using (CookieAwareWebClient wc = new CookieAwareWebClient())
        {
            wc.Cookies = Cookies;
            wc.DownloadFile(url, Filename);
        }
    }
//rest of BrowserSession

第三次在某处添加该类,该类允许将cookie从BrowserSession传递到WebClient.

Third Add this Class Somewhere, Which allows passing the cookies from BrowserSession to the WebClient.

public class CookieAwareWebClient : WebClient
{
    public CookieCollection Cookies = new CookieCollection();
    private void AddCookiesTo(HttpWebRequest request)
    {
        if (Cookies != null && Cookies.Count > 0)
        {
            request.CookieContainer.Add(Cookies);
        }
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest request = base.GetWebRequest(address);
        HttpWebRequest webRequest = request as HttpWebRequest;
        if (webRequest != null)
        {
            if (webRequest.CookieContainer == null) webRequest.CookieContainer = new CookieContainer();
            AddCookiesTo(webRequest);
        }
        return request;
    }
}

这应该使您能够像往常一样使用BrowserSession,并且在需要获取只能访问的文件时如果登录,只需调用BrowserSession.DownloadCookieProtectedFile()就像是WebClient一样,像这样设置Cookie:

This should Give you the ability to use BrowserSession Like you normally would, And when you need to get a file that you can only access If your logged in, Simply Call BrowserSession.DownloadCookieProtectedFile() As if it were a WebClient, Only Set the Cookies like so:

Using(wc = new CookieAwareWebClient())
{
    wc.Cookies = BrowserSession.Cookies
    //Download with WebClient As normal
    wc.DownloadFile();
}

这篇关于无论如何都可以使用&amp;"BrowserSession&amp;"下载文件?C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 18:04