我试图创建一个iOS应用程序,只是要提取网页的一部分。
我有用于连接到URL并将HTML存储在NSString中的代码
我已经尝试过了,但是我得到的结果只是空字符串
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body>" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
我尝试了另一种方法...
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@">" intoString:NULL];
// Go to end of opening <body> tag
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
第二种方法返回以
>< script...
等开头的字符串如果说老实话,我没有一个很好的URL来进行测试,并且我认为通过一些帮助去除体内标记的方法可能会更容易(例如
<p></p>
)任何帮助将不胜感激
最佳答案
我不知道您的第一种方法为何无效。我假设您在该代码段之前定义了bodyText。这段代码对我来说很好用,
- (void)viewDidLoad {
[super viewDidLoad];
NSString *htmlData = @"This is some stuff before <body> this is the body </body> with some more stuff";
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
NSString *bodyText;
while (![newScanner isAtEnd]) {
[newScanner scanUpToString:@"<body>" intoString:NULL];
[newScanner scanString:@"<body>" intoString:NULL];
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
}
NSLog(@"%@",bodyText); // 2015-01-28 15:58:00.360 ScanningOfHTMLProblem[1373:661934] this is the body
}
请注意,我添加了一个对
scanString:intoString:
的调用以跳过第一个"<body>"
。