用Html Agility Pack剥离所有的html标签

本文介绍了用Html Agility Pack剥离所有的html标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样的html字符串：

 < html>< body>< p> foo< a href ='http：//www.example.com'> bar< / a>巴兹< / P>< /体>< / HTML>

我希望去除所有html标签，以便生成的字符串变为：

  foo bar baz

从在这里的另一篇文章中，我提出了这个函数（它使用Html Agility Pack）：

 公共共享函数stripTags （ByVal html As String）As String 
 Dim plain As String = String.Empty 
 Dim htmldoc As New HtmlAgilityPack.HtmlDocument 
 
 htmldoc.LoadHtml（html）
 Dim invalidNodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes（// html | // body | // p | // a）
 
如果不是htmldoc没有那么
 For Each node in invalidNodes 
 node.ParentNode.RemoveChild（node，True）
 Next 
 End If 
 
返回htmldoc.DocumentNode.WriteContentTo 
 End Function

不幸的是，这并没有回报我期望的结果，而是给出了：

  bazbarfoo

请问哪里出错 - 这是最好的方法吗？

问候和快乐的编码！

更新：通过下面的答案，我想出了这个函数可能对其他人有用：
$ b $ pre $ 公共共享函数stripTags（ByVal html As String）As String
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml（html.Replace（，& New String（Environment.NewLine，2））。Replace（ ，Environment.NewLine））
返回htmldoc.DocumentNode.InnerText
End Function

解决方案而不是删除所有非文本节点？它应该给你你想要的。

I have a html string like this:
<html><body>foo <a href='http://www.example.com'>bar</a> baz</body></html>
I wish to strip all html tags so that the resulting string becomes:
foo bar baz
From another post here at SO I've come up with this function (which uses the Html Agility Pack):
Public Shared Function stripTags(ByVal html As String) As String Dim plain As String = String.Empty Dim htmldoc As New HtmlAgilityPack.HtmlDocument htmldoc.LoadHtml(html) Dim invalidNodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//html|//body|//p|//a") If Not htmldoc Is Nothing Then For Each node In invalidNodes node.ParentNode.RemoveChild(node, True) Next End If Return htmldoc.DocumentNode.WriteContentTo End Function
Unfortunately this does not return what I expect, instead it gives:
bazbarfoo
Please, where do I go wrong - and is this the best approach?
Regards and happy coding!
UPDATE: by the answer below I came up with this function, might be usefull to others:
Public Shared Function stripTags(ByVal html As String) As String Dim htmldoc As New HtmlAgilityPack.HtmlDocument htmldoc.LoadHtml(html.Replace("", "" & New String(Environment.NewLine, 2)).Replace(" ", Environment.NewLine)) Return htmldoc.DocumentNode.InnerText End Function
解决方案
Why not just return htmldoc.DocumentNode.InnerText instead of removing all the non-text nodes? It should give you what you want.

这篇关于用Html Agility Pack剥离所有的html标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！