本文介绍了解析内部HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这就是我要解析的

<div class="photoBox pB-ms">
<a href="/user_details?userid=ePDZ9HuMGWR7vs3kLfj3Gg">
<img width="100" height="100" alt="Photo of Debbie K." src="http://s3-media2.px.yelpcdn.com/photo/xZab5rpdueTCJJuUiBlauA/ms.jpg">
</a>
</div>

我正在使用以下XPath来找到它

I am using following XPath to find it

HtmlNodeCollection bodyNode = htmlDoc.DocumentNode.SelectNodes("//div[@class='photoBox pB-ms']");

这很好,我返回了所有带有photobox类的div

This is fine and return,s me all div,s with photobox class

但是当我想使用ahref时

But when I want to get ahref using

HtmlNodeCollection bodyNode = htmlDoc.DocumentNode.SelectNodes("//div[@class='photoBox pB-ms'//a href]");

我收到了错误的无效令牌.

I got error invalid token.

我也尝试使用查询

   var lowestreview =
  from main in htmlDoc.DocumentNode.SelectNodes("//div[@class='photoBox pB-ms']")
   from rating in main.SelectNodes("//a href")
  select new { Main=main.Attributes[0].Value,AHref = rating.ToString() };

有人会告诉我如何编写XPath或查询以获取此AHref

Will anybody tell me how to write XPath or query to get this AHref

推荐答案

此方法有效(已测试):

This works (tested):

HtmlNodeCollection bodyNodes = htmlDoc.DocumentNode
                                      .SelectNodes("//div[@class='photoBox pB-ms']/a[@href]");
foreach(var node in bodyNodes)
{
    string href = node.Attributes["href"].Value;
}

问题是属性和元素选择器混合在一起.另外,您也质疑它不清楚您是否真的想查询一个收藏集.

The problem is that you had attribute and element selectors mixed up. Also from you question its unclear whether you really intended to query for a collection.

上面的XPath选择器将选择所有具有href属性的a元素,这些元素是具有'photoBox pB-ms'类的div元素的子节点.然后,您可以迭代此集合并获取每个元素的href属性值.

The XPath selector above will select all a elements that have an href attribute that are child nodes of a div element with a class of 'photoBox pB-ms'. You can then iterate this collection and get the href attribute value of each element.

HtmlAgilityPack现在也支持Linq(从1.4版本开始),因此像这样简单地获取特定的属性值就容易得多(imo):

Also HtmlAgilityPack now supports Linq (since 1.4), so just getting a particular attribute value could be done much easier (imo) like this:

string hrefValue = htmlDoc.DocumentNode
                          .Descendants("div")
                          .Where(x => x.Attributes["class"].Value == "photoBox pB-ms")
                          .Select(x => x.Element("a").Attributes["href"].Value)
                          .FirstOrDefault();

这篇关于解析内部HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:21