问题描述
有几种不同的方法可以从 Cocoa
中的 NSString
中删除 HTML 标签
.
There are a couple of different ways to remove HTML tags
from an NSString
in Cocoa
.
一种方法是将字符串渲染为NSAttributedString
然后抓取渲染的文本.
One way is to render the string into an NSAttributedString
and then grab the rendered text.
另一种方式是使用 NSXMLDocument 的code> -
objectByApplyingXSLTString
方法来应用执行它的 XSLT
转换.
Another way is to use
NSXMLDocument's
-objectByApplyingXSLTString
method to apply an XSLT
transform that does it.
不幸的是,iPhone 不支持
NSAttributedString
或 NSXMLDocument
.有太多的边缘情况和格式错误的 HTML
文档让我觉得使用 regex 或 NSScanner
感觉很舒服.有没有人有解决办法?
Unfortunately, the iPhone doesn't support
NSAttributedString
or NSXMLDocument
. There are too many edge cases and malformed HTML
documents for me to feel comfortable using regex or NSScanner
. Does anyone have a solution to this?
一个建议是简单地寻找开始和结束标记字符,这种方法除了非常微不足道的情况外不起作用.
One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.
例如,这些案例(来自 Perl Cookbook 中关于同一主题的章节)会破坏这种方法:
For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
推荐答案
一个快速且肮脏"(删除 < 和 > 之间的所有内容)的解决方案,适用于 iOS >= 3.2:
A quick and "dirty" (removes everything between < and >) solution, works with iOS >= 3.2:
-(NSString *) stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
我将此声明为类别 os NSString.
I have this declared as a category os NSString.
这篇关于从 iPhone 上的 NSString 中删除 HTML 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!