问题描述
大家好,
我正面临技术问题。我浏览了几篇文章以找到答案,但我无法从任何网站上得到正确答案
。
我正在使用ScrapySharp为我的项目抓取网页数据。当我尝试从
http://edition.cnn.com/POLITICS网站抓取数据时出现此问题。
首先,我通过IE加载页面,然后我选择了Developer工具来检查标签。在我选择标签之后,我需要代码"// div [@ class ='cd__content']",
此外,当我通过ScrapySharp加载上述网页时
ScrapingBrowser browser = new ScrapingBrowser();
WebPage rootPage = browser.NavigateToPageAsync(new Uri(url));
HtmlNodeCollection rootNodes = rootPage.Html.SelectNodes("// div [@ class ='cd__content']");
rootNodes的结果显示为null
当我深入调查时,我看到的是上面提到的cd__content在"SECTION"中当页面加载"SECTION"标签时标记为空。
但是当我通过IE或Chrome检查时,所有标签都填充了信息
这就是我能够选择元素的原因,
但是当我以编程方式加载页面时,它不会。
我的问题是,如何使用ScrapySharp加载填充所有信息的页面。专家,请帮忙。
Hi all, I am facing a technical issue.
I browsed several articles to find the answer but I couldn’t get a proper answer
from any web site. I am using ScrapySharp for my project to crawl web page data.
This issue came when I try to crawl data from the
http://edition.cnn.com/POLITICS website. Firstly, I loaded the page via IE, and I selected Developer tools to inspect the tags.
After the I selected the tag what I need for my code "//div[@class='cd__content']",
Moreover when I load the above mentioned web page through ScrapySharp ScrapingBrowser browser = new ScrapingBrowser(); WebPage rootPage = browser.NavigateToPageAsync(new Uri(url)); HtmlNodeCollection rootNodes = rootPage.Html.SelectNodes("//div[@class='cd__content']"); The result for rootNodes shows as null When I investigate deep, What I saw is the above-mentioned cd__content is inside the
"SECTION" tag when the page loads the "SECTION" tag is empty.
But when I Inspect via IE or Chrome all tags are filled with information
that’s why I could able to pick the element,
but when I load the page programmatically it won’t. My question is, how can I load the page with filling all information
using ScrapySharp. Experts, Please help on this.
这篇关于无法使用ScrapySharp抓取网页数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!