问题描述
我正在创建一个需要从网页获取一些数据的 iOS 应用.我的第一个想法是使用 NSXMLParser initWithContentsOfURL:
并使用 NSXMLParser
委托解析 HTML.然而,这种方法似乎很快就会变得很痛苦(例如,如果 HTML 发生了变化,我将不得不重写解析代码,这可能会很尴尬).
I am creating an iOS app that needs to get some data from a web page. My first though was to use NSXMLParser initWithContentsOfURL:
and parse the HTML with the NSXMLParser
delegate. However this approach seems like it could quickly become painful (if, for example, the HTML changed I would have to rewrite the parsing code which could be awkward).
在加载网页时,我也查看了 UIWebView
.看起来 UIWebView
可能是要走的路.stringByEvaluatingJavaScriptFromString:
似乎是提取数据的一种非常方便的方法,并且允许将 javascript 存储在一个单独的文件中,如果 HTML 发生更改,该文件将很容易编辑.但是,使用 UIWebView
似乎有点 hacky(看到 UIWebView
是一个 UIView
子类,它可能会阻塞主线程,并且文档说javascript 有 10MB 的限制).
Seeing as I'm loading a web page I took take a look at UIWebView
too. It looks like UIWebView
may be the way to go. stringByEvaluatingJavaScriptFromString:
seems like a very handy way to extract the data and would allow the javascript to be stored in a separate file that would be easy to edit if the HTML changed. However, using UIWebView
seems a bit hacky (seeing as UIWebView
is a UIView
subclass it may block the main thread, and the docs say that the javascript has a limit of 10MB).
在我陷入困境之前,有人对解析 XML/HTML 有什么建议吗?
Does anyone have any advice regarding parsing XML/HTML before I get stuck in?
更新:
我写了一篇关于我的解决方案的博客文章:iOS 中的 HTML 解析/屏幕抓取
I wrote a blog post about my solution:HTML parsing/screen scraping in iOS
推荐答案
使用 XML 解析器解析 HTML 通常无论如何都不起作用,因为许多站点的 HTML 不正确,Web 浏览器会处理,但是像 NSXMLParser
将完全失败.
Parsing HTML with an XML parser usually does not work anyway because many sites have incorrect HTML, which a web browser will deal with, but a strict XML parser like NSXMLParser
will totally fail on.
对于许多脚本语言来说,有更好的爬虫库.就像 Python 的 Beautiful Soup 模块一样.不幸的是,我不知道 Objective-C 的此类模块.
For many scripting languages there are great scraping libraries that are more merciful. Like Python's Beautiful Soup module. Unfortunately I do not know of such modules for Objective-C.
将内容加载到 UIWebView
可能是最简单的方法.请注意,您不必将 UIWebView
放在屏幕上.您可以创建一个单独的 UIWindow
并将 UIWebView
添加到其中,以便进行完整的离屏渲染.我认为有一个关于此的 WWDC2009 视频.正如您已经提到的,它不会是轻量级的.
Loading stuff into a UIWebView
might be the simplest way to go here. Note that you do not have to put the UIWebView
on screen. You can create a separate UIWindow
and add the UIWebView
to it, so that you do full off-screen rendering. There was a WWDC2009 video about this I think. As you already mention, it will not be lightweight though.
根据您想要的数据和需要解析的页面的复杂性,您还可以使用正则表达式甚至手写解析器来解析它.我已经这样做了很多次,对于简单的数据,这很有效.
Depending on the data that you want and the complexity of the pages that you need to parse, you might also be able to parse it by using regular expressions or even a hand written parser. I have done this many times, and for simple data this works well.
这篇关于在 iOS 中解析 XML/“屏幕抓取"的最佳方法是什么?UIWebview 还是 NSXMLParser?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!