问题描述
我正在使用HTML Agility Pack从输入的网址中提取图片网址.
除了"Paytm.com",我可以提取图像.
在paytm.com中,当我看到页面源代码时,它显示5个"img"标签,而我只得到3个.
谁能告诉我为什么列表中只有三张而不是五张,我该如何解决这个问题?
我尝试过的事情:
I''m using HTML Agility Pack to extract image url from entered web address.
I''m able to fetch images except for "Paytm.com".
In paytm.com, when i see the page source, it displays 5 "img" tags, where as I am getting only 3.
Can anyone, tell me why I''m getting only three images in list instead five, and how can I solve this issue?
What I have tried:
string[] imgList = new string[20];
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("https://paytm.com/");
var i=0;
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//img"))
{
imgList[i] = node.Attributes["src"].Value;
i++;
}
推荐答案
<pre lang="c#">
公共静态List< string> AllImages(字符串startURL)
{
返回SpecificLinks(startURL,"//img","src");
}
公共静态List< string> SpecificLinks(字符串startUrl,字符串elementSelector,字符串attributeSelector)
{
List< string> links = new List< string>();
HtmlWeb hw =新的HtmlWeb();
HtmlDocument doc = hw.Load(startUrl);
HtmlNodeCollection docNodes;
试试
{
docNodes = doc.DocumentNode.SelectNodes(elementSelector);
}
赶上
{
docNodes = null;
}
如果(docNodes!= null)
{
foreach(doc.DocumentNode.SelectNodes(elementSelector)中的HtmlNode链接)
{
字符串elementSource = link.GetAttributeValue(attributeSelector,#");
如果(!elementSource.Equals(#"))
{
试试
{
Uri uri =新Uri(新Uri(startUrl),elementSource);
如果(!elementSource.Equals(uri.ToString()))
elementSource = uri.ToString();
其他
elementSource =#";
}
catch(Exception)
{
elementSource =#";
}
}
如果(!elementSource.Equals(#"))
links.Add(elementSource);
}
}
返回链接;
}
public static List<string> AllImages(string startURL)
{
return SpecificLinks(startURL, "//img", "src");
}
public static List<string> SpecificLinks(string startUrl, string elementSelector, string attributeSelector)
{
List<string> links = new List<string>();
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(startUrl);
HtmlNodeCollection docNodes;
try
{
docNodes = doc.DocumentNode.SelectNodes(elementSelector);
}
catch
{
docNodes = null;
}
if (docNodes != null)
{
foreach (HtmlNode link in doc.DocumentNode.SelectNodes(elementSelector))
{
string elementSource = link.GetAttributeValue(attributeSelector, "#");
if (!elementSource.Equals("#"))
{
try
{
Uri uri = new Uri(new Uri(startUrl), elementSource);
if (!elementSource.Equals(uri.ToString()))
elementSource = uri.ToString();
else
elementSource = "#";
}
catch (Exception)
{
elementSource = "#";
}
}
if (!elementSource.Equals("#"))
links.Add(elementSource);
}
}
return links;
}
这篇关于如何使用HTML敏捷包提取图像URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!