问题描述
我的文章在我的网站,我想获得纠正,并自动翻译。但我需要的内容,周围没有具有HTML标记。
I have articles on my website which I would like to get corrected and translated automatically. But I need to get the content, without having the HTML tags around.
这个想法是有一个可以检索所有的标签之间的内容(如果可能的话,正则表达式还发现,在标签领域,如&LT的内容; IMG ALT ='小家'>
)。问题是,我真的不知道该怎么写这样的正则表达式。任何想法?
The idea is to have a regex that could retrieve all the content between the tags (and, if possible, also the content found in tags fields like <img alt='Little house'>
). The problem is that I don't really know how to write such a regex. Any ideas?
推荐答案
我会建议使用 HTML解析器,而不是依靠一个正则表达式。与正则表达式解析HTML通常是一个没有没有,而且几乎不可能得到正确的所有情况。有很多问题在这里上,这样在相同的结论。
I would recommend using an HTML parser, rather than relying on a regex. Parsing HTML with regex is generally a no-no and are nearly impossible to get right for all cases. There are many questions here on SO that arrive at the same conclusion.
修改看起来像情侣我们有同样的想法...此外,的,讨论更多的解析器。
EDIT looks like a couple of us had the same idea... Also, here is a question that discusses more parsers.
这篇关于正则表达式匹配一个HTML输入的所有文本内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!