c# - HTMLAgilityPack和HTML页面

我有这个HTML页面：http://pastebin.com/ewN5NZis
我想尝试使用HtmlAgilityPack获得以下结果：
列表1:标题1，标题2
名单二：约翰，安东尼
清单3:2014年4月29日、2014年4月28日
我想用3种不同的方式存储数据。
我在尝试：

        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.OptionFixNestedTags = true;
        htmlDoc.LoadHtml(html);

        foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//tr"))
        {
            res += node.InnerHtml;
        }

在res变量中，我存储了文档的所有标记，对吗？现在我需要做什么才能得到3个列表？
谢谢。。

最佳答案

不建议把所有的原始文本都拿走，因为你必须拆分它，这是自杀。
试试这个（把每个<td>和它的特定类放在一起，把InnerText不InnerHTML）：

List<string> topicList = new List<string>;
List<string> authorList = new List<string>;
List<string> lastPostList = new List<string>;
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//td[@class='topic starter']"))
            {
                 string topic;
                 topic = node.InnerText;
                 topicList.Add(topic);
            }
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//td[@class='author']"))
            {
                 string author;
                 author = node.InnerText;
                 authorList.Add(author);
            }
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//td[@class='lastpost']"))
                {
                     string lastpost;
                     lastpost = node.InnerText;
                     lastPostList.Add(lastpost); // This will take also the author that posted last post (e.g. Antony 24/10/09).
                }

如果需要分隔文本：上次发布的作者和日期，可以对字符串使用.split()属性。

关于c# - HTMLAgilityPack和HTML页面，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/23429441/