本文介绍了正则表达式从HTML中删除所有的跨度保持内部文本,因为它是的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我要寻找一个正则表达式,可以删除所有的跨度保持内部文本。我有这样的在我的内心HTML跨度。I am looking for a regular expression which can erase all spans keeping inner text. I have this kind of spans in my inner HTML. 输入 正确格式的HTML <span style='font-size:10.0pt;font-family:"Arial","sans serif"'**> First span </span> <span style="color:#221E1F;"> <span style='font-size:10.0pt;font-family:"Arial";color:windowtext'> This is to test Regular expression </span> </span> <span style="color:#221E1F;"><span style='font-size:10.0pt;font-family: "Arial","sans-serif";color:#548DD4'> last Span text </span> </span> 格式不正确的: <span style='font-size:10.0pt;font-family:"Arial","sans-serif"; mso-bidi-font-style:italic'>&lt;%T</span><span class="A1"><span style='font-size: 10.0pt;font-family:"Arial","sans-serif";mso-fareast-font-family:Calibri; mso-fareast-theme-font:minor-latin;color:windowtext'>PA_Enrollment_Options%&gt; one of the convenient options below</span></span><span class="A1"><span style='font-size:10.0pt;font-family:"Arial","sans-serif";mso-fareast-font-family: Calibri;mso-fareast-theme-font:minor-latin;color:#548DD4;mso-themecolor:text2; mso-themetint:153'>: <o:p></o:p></span></span> 预计输出:一是跨度这是测试正则表达式的最后一个跨距文本Expected Output : First Span This is to test Regular expression last span text我曾尝试此正则表达式 - (小于跨度*([\r\\\])*方式>)|( <跨度*>)|(LT; / SPAN>)I have tried this regex - (<span.*([\r\n]).*>)|(<span.*>)|(</span>).这是工作,当我的HTML格式正确,但在HTML中的我的情况压痕是不妥当的。This is working when my HTML is properly formatted, but in my case indentation of HTML is not proper.我不使用正则表达式来解析完全。我在做内部HTML这种操作仅I am not using regex to parsing completely . I am doing this operation in inner html only推荐答案您可以用HtmlAgilityPack做正确:You can do it properly with HtmlAgilityPack:public string getCleanHtml(string html){ var doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); // return HtmlAgilityPack.HtmlEntity.DeEntitize(doc.DocumentNode.InnerText); // Use if you want to convert HTML entities to their literal view return doc.DocumentNode.InnerText; // if you want to keep HTML entities} 然后And thenvar result = getCleanHtml(myInputHtml);下面是输出:在情况下,你需要得到去掉空格,你可以使用一个简单的与string.replace 或 Regex.Replace 或分流/ join方法这取决于你的实际需要。In case you need to get rid of whitespace, you can use either a simple String.Replace, or a Regex.Replace or split/join method depending on what you actually need. 这篇关于正则表达式从HTML中删除所有的跨度保持内部文本,因为它是的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-30 23:10