Html 敏捷包.加载和抓取网页

本文介绍了Html 敏捷包.加载和抓取网页的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是在抓取时获取网页的最佳方式吗?

Is this the best way to get a webpage when scraping?

HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();

var doc = new HtmlAgilityPack.HtmlDocument();

doc.Load(resp.GetResponseStream());
var element = doc.GetElementbyId("//start-left");
var element2 = doc.DocumentNode.SelectSingleNode("//body");
string html = doc.DocumentNode.OuterHtml;

我已经看到 HtmlWeb().Load 来获取网页.这是加载和抓取网页的更好选择吗?

I've seen HtmlWeb().Load to get a webpage. Is that a better alternative to load and the scrape the webpage?

好的，我试试.

HtmlDocument doc = web.Load(url);

现在，当我得到我的 doc 并且没有得到如此多的属性时.没有人喜欢SelectSingleNode.我唯一可以使用的是 GetElementById，它可以工作，但我想要一个类.

Now when i got my doc and didn't get so mutch properties. No one like SelectSingleNode. The only one I can use is GetElementById, and that works but I whant to get a class.

我需要这样做吗?

var htmlBody = doc.DocumentNode.SelectSingleNode("//body");
htmlBody.SelectSingleNode("//paging");

取网页

Html 敏捷包.加载和抓取网页

问题描述

推荐答案