是否可以抓取动态网页生成的数据?我的意思是例如此网站生成标签< font>
Is it possible to scrape data generated by dynamic web page? I mean for example This website generates the tag <font>
with some java script which is
document.write("<font class=spy2>:<\/font>"+(v2j0j0^o5r8)+(r8d4x4^y5i9)+(b2r8e5^u1p6)+(r8d4x4^y5i9))
The values change on each page refresh. Each generated code represents a number from 0 to 9, for example (code1)+(code2)+(code3)+(code4)
and at the back end some type of parser is written which understands it and generates the numbers accordingly.
一旦呈现了页面,例如将 code1
Once the page is rendered and for example code1
was set some where for digit 4 the where ever the digit 4 is generated it comes from this code after getting parsed.
如果我们使用 HtmlAgilityPack
If we use HtmlAgilityPack
we see that java script code but not its generated output. Is there any way we can read the tag it creates when the page is rendered?
Thanks for pointing out.I saw that by implementing .same results but then looking at one more comment who says use IE engine i turned and made a small application that does the job.I added IE and navigated it to the website and read the content.Here is the code
private void webBrowser1_DocumentCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
System.Windows.Forms.HtmlElementCollection elementsforViewPost =
foreach (System.Windows.Forms.HtmlElement current2 in elementsforViewPost)
if (current2.InnerText != null && CheckForValidProxyAddress(current2.InnerText) &&
ObtainedProxies.Where(index=>index.ProxyAddress == current2.InnerText.Trim()).ToList().Count == 0)
Proxy data = new Proxy();
data.IsRetired = false;
data.IsActive = true;
int result = 1;
data.DomainsVisited = 0;
data.ProxyAddress = current2.InnerText.Trim();
and for checking that received text is valid proxy here is what i did got it from some page long ago by googling
private bool CheckForValidProxyAddress(string address)
//create our match pattern
//string pattern = @"^([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}$:([0-9][0-9][0-9][0-9])";
string pattern = @"\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b\:[0-9]{0,4}";
//create our Regular Expression object
Regex check = new Regex(pattern);
//boolean variable to hold the status
bool valid = false;
//check to make sure an ip address was provided
if (address == "")
//no address provided so return false
valid = false;
//address provided so use the IsMatch Method
//of the Regular Expression object
valid = check.IsMatch(address, 0);
//return the results
return valid;