问题描述
我正在编写一个Python脚本,该脚本允许将html文档转换为 reveal.js reveal.js 幻灯片.为此,我需要在<section>
标记内包装多个标记.
I'm writing a python script that allow to convert a html doc into a reveal.js slideshow. To do this, I need to wrap multiple tags inside a <section>
tag.
使用 wrap()
方法.但是我不知道如何包装多个标签.
It's easy to wrap a single tag inside another one using the wrap()
method. However I can't figure out how I can wrap multiple tags.
澄清示例,原始html:
An example for clarification, the original html:
html_doc = """
<html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<h1 id="first-paragraph">First paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
<div>
<a href="http://link.com">Here's a link</a>
</div>
<h1 id="second-paragraph">Second paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
<script src="lib/.js"></script>
</body>
</html>
"""
"""
我想将<h1>
及其下一个标签包装在<section>
标签内,如下所示:
I'd like to wrap the <h1>
and their next tags inside <section>
tags, like this:
<html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<section>
<h1 id="first-paragraph">First paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
<div>
<a href="http://link.com">Here's a link</a>
</div>
</section>
<section>
<h1 id="second-paragraph">Second paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
</section>
<script src="lib/.js"></script>
</body>
</html>
这是我的选择方式:
from bs4 import BeautifulSoup
import itertools
soup = BeautifulSoup(html_doc)
h1s = soup.find_all('h1')
for el in h1s:
els = [i for i in itertools.takewhile(lambda x: x.name not in [el.name, 'script'], el.next_elements)]
els.insert(0, el)
print(els)
输出:
[<h1 id="first-paragraph">First paragraph</h1>, 'First paragraph', '\n ', <p>Some text...</p>, 'Some text...', '\n ', <p>Another text...</p>, 'Another text...', '\n ', <div><a href="http://link.com">Here's a link</a> </div>, '\n ', <a href="http://link.com">Here's a link</a>, "Here's a link", '\n ', '\n\n ']
[<h1 id="second-paragraph">Second paragraph</h1>, 'Second paragraph', '\n ', <p>Some text...</p>, 'Some text...', '\n ', <p>Another text...</p>, 'Another text...', '\n\n ']
选择正确,但我看不到如何将每个选择包装在<section>
标记内.
The selection is correct but I can't see how to wrap each selection inside a <section>
tag.
推荐答案
最后,我找到了在这种情况下如何使用wrap
方法的方法.我需要了解,汤对象的每一次更改都是适当的.
Finally I found how to use the wrap
method in that case. I needed to understand that every change in the soup object is made in place.
from bs4 import BeautifulSoup
import itertools
soup = BeautifulSoup(html_doc)
# wrap all h1 and next siblings into sections
h1s = soup.find_all('h1')
for el in h1s:
els = [i for i in itertools.takewhile(
lambda x: x.name not in [el.name, 'script'],
el.next_siblings)]
section = soup.new_tag('section')
el.wrap(section)
for tag in els:
section.append(tag)
print(soup.prettify())
这给了我想要的输出.希望对您有所帮助.
This gives me the desired output. Hopes that's help.
这篇关于用BeautifulSoup包装多个标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!