我有以下HTML代码段:
<h3>Language</h3>
<a class="syllabus-item" href="">French</a>
<a class="syllabus-item" href="">English</a>
<a class="syllabus-item" href="">Spanish</a>
<h3>Music</h3>
<a class="syllabus-item" href="">Rock</a>
<a class="syllabus-item" href="">Pop</a>
我希望输出为:
1 - Language/1 - French
1 - Language/2 - English
1 - Language/3 - Spanish
2 - Music/1 - Rock
2 - Music/2 -Pop
这是我的代码
def get_genre_band(soup):
genre = None
for node in soup.findAll(['h3', 'a']):
if node.name == 'h3':
genre = node.text
elif 'syllabus-item' in node.get('class', ''):
yield genre.strip(), node.text.strip()
我正在这样使用它:
for g, b in get_genre_band(section):
print("{} \n\t{}".format(g, b))
但是我无法得到正确的计数,我得到的是这样的:
1 - Language/1 - French
1 - Language/2 - English
1 - Language/3 - Spanish
8 - Music/4 - Rock
9 - Music/5 -Pop
最佳答案
您可以将.next_sibling
用于此任务。
码:
for i, header in enumerate(soup.find_all('h3'), 1):
next_tag = header
j = 1
while True:
next_tag = next_tag.next_sibling
if next_tag is None or next_tag.name == 'h3':
break
if next_tag.name is not None:
print('{} - {}/{} - {}'.format(i, header.text, j, next_tag.string))
j += 1
输出:
1 - Language/1 - French
1 - Language/2 - English
1 - Language/3 - Spanish
2 - Music/1 - Rock
2 - Music/2 - Pop
关于python - beautifulsoup在两个标签之间提取文本,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48199455/