问题描述
我学习BeautifulSoup,发现不少html2text的解决方案,但一我要找的应该模仿的格式:
I'm learning BeautifulSoup, and found many "html2text" solutions, but the one i'm looking for should mimic the formatting:
<ul>
<li>One</li>
<li>Two</li>
</ul>
将成为
* One
* Two
和
Some text
<blockquote>
More magnificent text here
</blockquote>
Final text
到
Some text
More magnificent text here
Final text
我在阅读文档,但我没有看到任何东西直线前进。任何帮助吗?我愿意用比其他beautifulsoup东西。
I'm reading the docs, but I'm not seeing anything straight forward. Any help? I'm open to using something other than beautifulsoup.
推荐答案
看看亚伦·斯沃茨的脚本(可与安装PIP安装html2text
)。请注意,输出有效。如果由于某种原因,这并不完全适合你,有些相当琐碎的调整应该让你在你的问题中精确的输出:
Take a look at Aaron Swartz's html2text script (can be installed with pip install html2text
). Note that the output is valid Markdown. If for some reason that doesn't fully suit you, some rather trivial tweaks should get you the exact output in your question:
In [1]: import html2text
In [2]: h1 = """<ul>
...: <li>One</li>
...: <li>Two</li>
...: </ul>"""
In [3]: print html2text.html2text(h1)
* One
* Two
In [4]: h2 = """<p>Some text
...: <blockquote>
...: More magnificent text here
...: </blockquote>
...: Final text</p>"""
In [5]: print html2text.html2text(h2)
Some text
> More magnificent text here
Final text
这篇关于Python的HTML转换为文本,格式化模仿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!