我正在尝试使用BeautifulSoup从网页中抓取两个值。仅打印一个值时,内容看起来不错。但是,当打印两个值(到同一行)时,在其中一个值附近显示html代码。

这是我的代码:

from bs4 import BeautifulSoup
import urllib.request as urllib2


list_open = open("source.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")


i = 0
for url in line_in_list:
    soup = BeautifulSoup(urllib2.urlopen(url).read(), 'html.parser')
    sku = soup.find_all(attrs={'class': "identifier"})
    description = soup.find_all(attrs={'class': "description"})
    for text in description:
        print((sku), text.getText())
    i += 1


输出看起来像这样:

[<span class="identifier">112404</span>] A natural for...etc
[<span class="identifier">110027</span>] After what...etc
[<span class="identifier">03BA5730</span>] Argentina is know...etc
[<span class="identifier">090030</span>] To be carried...etc


输出应该最好没有数字周围的[<span class="identifier">

我猜问题出在最后一个for循环中,但是我不知道如何纠正它。感谢所有帮助。谢谢! -埃斯彭

最佳答案

看来您需要zip()标识符和描述,并为循环中找到的每个标记调用getText()

identifiers = soup.find_all(attrs={'class': "identifier"})
descriptions = soup.find_all(attrs={'class': "description"})

for identifier, description in zip(identifiers, descriptions):
    print(identifier.getText(), description.getText())

关于python - 如何使用Python从BeautifulSoup打印多个值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/35399668/

10-16 02:17