本文介绍了解析HTML使用C的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我要抓住从HTML(XHTML有效)网页一些内容。我使用curl抓取页面,并将其存储在内存中。

I need to grab some content from an HTML (XHTML valid) page. I grab the page using curl and store it in memory.

我打了使用正则表达式与PCRE库的想法,但简单地使用它。然后我提出要看HTML解析器,并再次没有一个很好的选择我找不到任何的例子。所有我能找到是的libxml轻薄的记录模块调用HTMLParser的。

I played with the idea of using regex with the PCRE library, but simply I couldn't find any examples using it with C. Then I moved on to look at HTML parsers and again there is not a good selection. All I could find was a skimpy documented module for libxml called HTMLparser.

有什么办法?如果没有,那么我发现已经例子吗?

Are there any alternatives? If not, then examples for what I found already?

推荐答案

您想使用HTML整洁做到这一点。该库卷曲页面有一些源代码code,让你去。文档遍历DOM树。你并不需要一个XML解析器。没有失败的严重格式化的HTML。

You want to use HTML tidy to do this. The Lib curl page has some source code to get you going. Documents traversing the dom tree. You don't need an xml parser. Doesn't fail on badly formated html.

这篇关于解析HTML使用C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 16:44