本文介绍了如何从字符串中提取有效单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有这个字符串: & lt; p& gt;& lt; img width =& quot; 600& quot;高度=安培; QUOT; 366&安培; QUOT; SRC =&安培; QUOT; HTTP://www.channelstv.com/wp-content/uploads/2014/01/boko-haram-2.jpg& QUOT; class =& quot; attachment-post-thumbnail wp-post-image& quot; ALT =&安培; QUOT;博科圣地-2及QUOT; /& gt;& lt; / p& gt;& lt; em& gt;& lt; strong& gt;& lt;在卡加当地的Jakana村发生深夜袭击,造成11人死亡尼日利亚东北部博尔诺州政府区作为Magumeri的居民,在博卡哈拉姆即将发动袭击的谣言之后逃离。& lt; / strong& gt;& lt; / em& gt; i想要只提取有效单词,跳过和不可读的单词,我试过这个: < pre>尝试 { string msg = & lt; p& gt;& lt; img width =& quot; 600& quot; height =& quot; 366& quot; src =& quot; http://www.channelstv.com/wp-content/uploads/2014/01/boko-haram-2.jpg& quot; class =& quot; attachment-post-thumbnail wp-post-图像& quot; alt =& quot; boko-haram-2& quot; /& gt;& lt; / p& gt;& lt; em& gt;& lt; strong& gt;&在尼日利亚东北部博尔诺州卡加地方政府区雅加达村的深夜袭击事件中,有11人被杀,因为Magumeri的居民在博卡哈拉姆即将袭击的谣言后逃离。& lt; / strong& gt ;&安培; LT; / EM&安培; gt;中; string retrieve = msg.Substring( 20 , 105 ); } catch (IndexOutOfRangeException) { } < / pre > 但它似乎没有给我我想要的结果。 任何帮助将不胜感激。在此先感谢... SHOUTING删除 - OriginalGriff [/ edit] 解决方案 请阅读我对这个问题的评论。 有两种从源字符串中提取子字符串的一般方法。 1)如何:使用字符串方法搜索字符串(C#编程指南) [ ^ ] 2)如何:使用正则表达式搜索字符串(C#编程指南) [ ^ ] 冒着出现傲慢的风险,我会告诉你你真正想要做什么。 首先,HTML解码字符串。 这会转换HTML编码的内容,例如& lt;到<。 请参阅 this [ ^ ]如何执行此操作。 一旦你有一个包含HTML和纯文本的字符串,使用我的 StringParser [ ^ ]实用程序的 removeHtml()方法。 您将首先留下您想要的文字。 使用你的例子,结果字符串将是: 卡加地方政府Jakana村的深夜袭击造成11人死亡作为Mag的居民,尼日利亚东北部的博尔诺州地区umeri在Boko Haram即将发动袭击的谣言后逃离。 / ravi string retrieve = msg.Substring(msg.IndexOf( Eleven),(msg.IndexOf( 。<) - msg.IndexOf( Eleven))); I am having this string:&lt;p&gt;&lt;img width=&quot;600&quot; height=&quot;366&quot; src=&quot;http://www.channelstv.com/wp-content/uploads/2014/01/boko-haram-2.jpg&quot; class=&quot;attachment-post-thumbnail wp-post-image&quot; alt=&quot;boko-haram-2&quot; /&gt;&lt;/p&gt;&lt;em&gt;&lt;strong&gt;&lt;Eleven people have been killed in a late night attack in Jakana village, Kaga Local Government Area of Borno State, north east Nigeria as residents of Magumeri flee after rumour of an impending attack by Boko Haram.&lt;/strong&gt;&lt;/em&gt;i want to extract only valid word and skip and word that is not readable, i have tried this:<pre>try { string msg = "&lt;p&gt;&lt;img width=&quot;600&quot; height=&quot;366&quot; src=&quot;http://www.channelstv.com/wp-content/uploads/2014/01/boko-haram-2.jpg&quot; class=&quot;attachment-post-thumbnail wp-post-image&quot; alt=&quot;boko-haram-2&quot; /&gt;&lt;/p&gt;&lt;em&gt;&lt;strong&gt;&lt;Eleven people have been killed in a late night attack in Jakana village, Kaga Local Government Area of Borno State, north east Nigeria as residents of Magumeri flee after rumour of an impending attack by Boko Haram.&lt;/strong&gt;&lt;/em&gt;"; string retrieve = msg.Substring(20, 105); } catch(IndexOutOfRangeException) { }</pre>but it doesn't seem to give me the desire result.Any assistance will be appreciated. Thanks in advance...[edit]SHOUTING removed - OriginalGriff[/edit] 解决方案 Please, read my comments to the question.There are 2 general ways to "extract" substring from source string.1) How to: Search Strings Using String Methods (C# Programming Guide)[^]2) How to: Search Strings Using Regular Expressions (C# Programming Guide)[^]At the risk of appearing arrogant, I will tell you what you really want to do.First, HTML decode the string.  This converts HTML encoded content like "&lt;" to "<".  See this[^] link for how to do this. Once you have a string that contains HTML and plain text, remove the HTML using my StringParser[^] utility's removeHtml() method.You will be left with just the text that you wanted in the first place.Using your example, the resulting string will be:Eleven people have been killed in a late night attack in Jakana village, Kaga Local Government Area of Borno State, north east Nigeria as residents of Magumeri flee after rumour of an impending attack by Boko Haram./ravistring retrieve = msg.Substring(msg.IndexOf("Eleven"), (msg.IndexOf(".<") - msg.IndexOf("Eleven"))); 这篇关于如何从字符串中提取有效单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 13:10