如何使用HTML敏捷包提取图像URL?

本文介绍了如何使用HTML敏捷包提取图像URL?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用HTML Agility Pack从输入的网址中提取图片网址.

除了"Paytm.com"，我可以提取图像.

在paytm.com中，当我看到页面源代码时，它显示5个"img"标签，而我只得到3个.

谁能告诉我为什么列表中只有三张而不是五张，我该如何解决这个问题?

我尝试过的事情:

I''m using HTML Agility Pack to extract image url from entered web address.

I''m able to fetch images except for "Paytm.com".

In paytm.com, when i see the page source, it displays 5 "img" tags, where as I am getting only 3.

Can anyone, tell me why I''m getting only three images in list instead five, and how can I solve this issue?

What I have tried:

string[] imgList = new string[20];
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("https://paytm.com/");
var i=0;
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//img"))
{
    imgList[i] = node.Attributes["src"].Value;
    i++;
}

推荐答案

<pre lang="c#">

公共静态List< string> AllImages(字符串startURL)
{
返回SpecificLinks(startURL，"//img"，"src");
}

公共静态List< string> SpecificLinks(字符串startUrl，字符串elementSelector，字符串attributeSelector)
{
List< string> links = new List< string>();

HtmlWeb hw =新的HtmlWeb();
HtmlDocument doc = hw.Load(startUrl);
HtmlNodeCollection docNodes;

试试
{
docNodes = doc.DocumentNode.SelectNodes(elementSelector);
}
赶上
{
docNodes = null;
}

如果(docNodes！= null)
{
foreach(doc.DocumentNode.SelectNodes(elementSelector)中的HtmlNode链接)
{
字符串elementSource = link.GetAttributeValue(attributeSelector，#");

如果(！elementSource.Equals(#"))
{
试试
{
Uri uri =新Uri(新Uri(startUrl)，elementSource);

如果(！elementSource.Equals(uri.ToString()))
elementSource = uri.ToString();
其他
elementSource =#";
}
catch(Exception)
{
elementSource =#";
}
}

如果(！elementSource.Equals(#"))
links.Add(elementSource);
}
}

返回链接；
}

public static List<string> AllImages(string startURL)
{
return SpecificLinks(startURL, "//img", "src");
}

public static List<string> SpecificLinks(string startUrl, string elementSelector, string attributeSelector)
{
List<string> links = new List<string>();

HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(startUrl);
HtmlNodeCollection docNodes;

try
{
docNodes = doc.DocumentNode.SelectNodes(elementSelector);
}
catch
{
docNodes = null;
}

if (docNodes != null)
{
foreach (HtmlNode link in doc.DocumentNode.SelectNodes(elementSelector))
{
string elementSource = link.GetAttributeValue(attributeSelector, "#");

if (!elementSource.Equals("#"))
{
try
{
Uri uri = new Uri(new Uri(startUrl), elementSource);

if (!elementSource.Equals(uri.ToString()))
elementSource = uri.ToString();
else
elementSource = "#";
}
catch (Exception)
{
elementSource = "#";
}
}

if (!elementSource.Equals("#"))
links.Add(elementSource);
}
}

return links;
}

这篇关于如何使用HTML敏捷包提取图像URL?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！