本文介绍了Python的HTML转换为文本,格式化模仿的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我学习BeautifulSoup,发现不少html2text的解决方案,但一我要找的应该模仿的格式:

I'm learning BeautifulSoup, and found many "html2text" solutions, but the one i'm looking for should mimic the formatting:

<ul>
<li>One</li>
<li>Two</li>
</ul>

将成为

* One
* Two

Some text
<blockquote>
More magnificent text here
</blockquote>
Final text

Some text

    More magnificent text here

Final text

我在阅​​读文档,但我没有看到任何东西直线前进。任何帮助吗?我愿意用比其他beautifulsoup东西。

I'm reading the docs, but I'm not seeing anything straight forward. Any help? I'm open to using something other than beautifulsoup.

推荐答案

看看亚伦·斯沃茨的脚本(可与安装PIP安装html2text )。请注意,输出有效。如果由于某种原因,这并不完全适合你,有些相当琐碎的调整应该让你在你的问题中精确的输出:

Take a look at Aaron Swartz's html2text script (can be installed with pip install html2text). Note that the output is valid Markdown. If for some reason that doesn't fully suit you, some rather trivial tweaks should get you the exact output in your question:

In [1]: import html2text

In [2]: h1 = """<ul>
   ...: <li>One</li>
   ...: <li>Two</li>
   ...: </ul>"""

In [3]: print html2text.html2text(h1)
  * One
  * Two

In [4]: h2 = """<p>Some text
   ...: <blockquote>
   ...: More magnificent text here
   ...: </blockquote>
   ...: Final text</p>"""

In [5]: print html2text.html2text(h2)
Some text

> More magnificent text here

Final text

这篇关于Python的HTML转换为文本,格式化模仿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-03 01:39
查看更多