将xhtml转换回html

看起来很简单，但我遇到了一些麻烦。 regexps绊倒因为我还要考虑''img''，'meta''，''link''标签，而不是只是简单''br''和''hr''标签。好吧，也许有一个简单的方法用regexp来做，但我的简单< img [^（/>）] + /不起作用。我不是一个足够的正则表达式专业人士来找出前瞻性的东西。 I''ve got lots of xhtml pages that need to be fed to MS HTML Workshop tocreate CHM files. That application really hates xhtml, so I need to convertself-ending tags (e.g. ) to plain html (e.g. ).Seems simple enough, but I''m having some trouble with it. regexps trip upbecause I also have to take into account ''img'', ''meta'', ''link'' tags, notjust the simple ''br'' and ''hr'' tags. Well, maybe there''s a simple way to dothat with regexps, but my simpleminded <img[^(/>)]+/doesn''t work. I''m notenough of a regexp pro to figure out that lookahead stuff. 我不确定这是否非常有用，但以下是关于下面的一个非常简单的例子。Hi, I''m not sure if this is very helpful but the following works onthe very simple example below. >> import re xhtml ='' hello< ; img src =" /img.png" / spam '' xtag = re.compile（r''<（[^>] *？）/ >''） xtag.sub（r''< \1>''，xhtml）>>import rexhtml = ''hello <img src="/img.png"/spam <br/bye ''xtag = re.compile(r''<([^>]*?)/>'')xtag.sub(r''<\1>'', xhtml) '' hello< img src =" /img.png" spam< brbye'' - Arnaud''hello <img src="/img.png"spam <brbye ''--Arnaud" Gary Herron" < gh ***** @ islandtraining.comwrote in message news：ma ************************* ************* @ pyth on.org ..."Gary Herron" <gh*****@islandtraining.comwrote in messagenews:ma**************************************@pyth on.org... Tim Arnold写道：Tim Arnold wrote: >我有很多需要提供给MS HTML Workshop的xhtml页面来创建CHM文件。该应用程序真的很讨厌xhtml，所以我需要将自结尾标签（例如 ）转换为普通html（例如 ）。似乎很简单，但我遇到了一些麻烦。 regexps绊倒因为我还必须考虑''img''，'meta''，''link''标签，而不仅仅是简单的''br''和'' hr''标签。好吧，也许有一种简单的方法可以用正则表达式做到这一点，但我的简单< img [^（/>）] + /不起作用。我不是足够的正则表达式专业人士可以找出前瞻性的东西。我不知道从哪里开始;我看了BeautifulSoup和 BeautifulStoneSoup，但我看不出如何修改实际的标签。谢谢， --Tim Arnold - http://mail.python.org / mailman / listinfo / python-list 无论你是否能找到满足你需求的应用程序，我都会支付我不知道，但至少我可以这么说。你不应该自己阅读和解析文本！ XHTML是有效的 XML，有很多方法可以用Python读取和解析XML。（ElementTree是我使用的，但是存在其他选择。）一旦你使用现有包将你的文件读入内部树形结构表示，遍历树应该是一个相对容易的工作发出你想要的标签和文字。加里赫伦Whether or not you can find an application that does what you want, Idon''t know, but at the very least I can say this much.You should not be reading and parsing the text yourself! XHTML is validXML, and there a lots of ways to read and parse XML with Python.(ElementTree is what I use, but other choices exist.) Once you use anexisting package to read your files into an internal tree structurerepresentation, it should be a relatively easy job to traverse the tree toemit the tags and text you want.Gary Herron 我同意，我真的不想解析我自己。但是，ET将清理文件，在我的情况下包含一些需要作为元数据的注释，所以不能工作。哦，我可以让ET读它并编写一个新的解析器 - 我看到你的意思。我想我需要继承，所以我也可以让ET来兑现这些评论。这是一种方法，我只是希望能有更轻松的事情。谢谢， --TimI agree and I''d really rather not parse it myself. However, ET will clean upthe file which in my case includes some comments required as metadata, sothat won''t work. Oh, I could get ET to read it and write a new parser--I seewhat you mean. I think I need to subclass so I could get ET to honor thosecomments too.That''s one way to go, I was just hoping for something easier.thanks,--Tim 这篇关于将xhtml转换回html的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！ 1403页，肝出来的..