本文介绍了Html 敏捷包.加载和抓取网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是在抓取时获取网页的最佳方式吗?
Is this the best way to get a webpage when scraping?
HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();
var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(resp.GetResponseStream());
var element = doc.GetElementbyId("//start-left");
var element2 = doc.DocumentNode.SelectSingleNode("//body");
string html = doc.DocumentNode.OuterHtml;
我已经看到 HtmlWeb().Load
来获取网页.这是加载和抓取网页的更好选择吗?
I've seen HtmlWeb().Load
to get a webpage. Is that a better alternative to load and the scrape the webpage?
好的,我试试.
HtmlDocument doc = web.Load(url);
现在,当我得到我的 doc
并且没有得到如此多的属性时.没有人喜欢SelectSingleNode
.我唯一可以使用的是 GetElementById
,它可以工作,但我想要一个类.
Now when i got my doc
and didn't get so mutch properties. No one like SelectSingleNode
. The only one I can use is GetElementById
, and that works but I whant to get a class.
我需要这样做吗?
var htmlBody = doc.DocumentNode.SelectSingleNode("//body");
htmlBody.SelectSingleNode("//paging");
推荐答案
使用 HtmlWeb 更容易.
Much easier to use HtmlWeb.
string Url = "http://something";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
这篇关于Html 敏捷包.加载和抓取网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!