用BeautifulSoup包装多个标签

用BeautifulSoup包装多个标签

本文介绍了用BeautifulSoup包装多个标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Python脚本,该脚本允许将html文档转换为 reveal.js reveal.js 幻灯片.为此,我需要在<section>标记内包装多个标记.

I'm writing a python script that allow to convert a html doc into a reveal.js slideshow. To do this, I need to wrap multiple tags inside a <section> tag.

使用 wrap() 方法.但是我不知道如何包装多个标签.

It's easy to wrap a single tag inside another one using the wrap() method. However I can't figure out how I can wrap multiple tags.

澄清示例,原始html:

An example for clarification, the original html:

html_doc = """
<html>

<head>
  <title>The Dormouse's story</title>
</head>

<body>

  <h1 id="first-paragraph">First paragraph</h1>
  <p>Some text...</p>
  <p>Another text...</p>
  <div>
    <a href="http://link.com">Here's a link</a>
  </div>

  <h1 id="second-paragraph">Second paragraph</h1>
  <p>Some text...</p>
  <p>Another text...</p>

  <script src="lib/.js"></script>
</body>

</html>
"""


"""

我想将<h1>及其下一个标签包装在<section>标签内,如下所示:

I'd like to wrap the <h1> and their next tags inside <section> tags, like this:

<html>
<head>
  <title>The Dormouse's story</title>
</head>
<body>

  <section>
    <h1 id="first-paragraph">First paragraph</h1>
    <p>Some text...</p>
    <p>Another text...</p>
    <div>
      <a href="http://link.com">Here's a link</a>
    </div>
  </section>

  <section>
    <h1 id="second-paragraph">Second paragraph</h1>
    <p>Some text...</p>
    <p>Another text...</p>
  </section>

  <script src="lib/.js"></script>
</body>

</html>

这是我的选择方式:

from bs4 import BeautifulSoup
import itertools
soup = BeautifulSoup(html_doc)
h1s = soup.find_all('h1')
for el in h1s:
    els = [i for i in itertools.takewhile(lambda x: x.name not in [el.name, 'script'], el.next_elements)]
    els.insert(0, el)
    print(els)

输出:

[<h1 id="first-paragraph">First paragraph</h1>, 'First paragraph', '\n  ', <p>Some text...</p>, 'Some text...', '\n  ', <p>Another text...</p>, 'Another text...', '\n  ', <div><a href="http://link.com">Here's a link</a>  </div>, '\n    ', <a href="http://link.com">Here's a link</a>, "Here's a link", '\n  ', '\n\n  ']

[<h1 id="second-paragraph">Second paragraph</h1>, 'Second paragraph', '\n  ', <p>Some text...</p>, 'Some text...', '\n  ', <p>Another text...</p>, 'Another text...', '\n\n  ']

选择正确,但我看不到如何将每个选择包装在<section>标记内.

The selection is correct but I can't see how to wrap each selection inside a <section> tag.

推荐答案

最后,我找到了在这种情况下如何使用wrap方法的方法.我需要了解,汤对象的每一次更改都是适当的.

Finally I found how to use the wrap method in that case. I needed to understand that every change in the soup object is made in place.

from bs4 import BeautifulSoup
import itertools
soup = BeautifulSoup(html_doc)

# wrap all h1 and next siblings into sections
h1s = soup.find_all('h1')
for el in h1s:
    els = [i for i in itertools.takewhile(
              lambda x: x.name not in [el.name, 'script'],
              el.next_siblings)]
    section = soup.new_tag('section')
    el.wrap(section)
    for tag in els:
        section.append(tag)

print(soup.prettify())

这给了我想要的输出.希望对您有所帮助.

This gives me the desired output. Hopes that's help.

这篇关于用BeautifulSoup包装多个标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-18 05:35