python - BeautifulSoup :爬取表数据

我想从下面的网址中提取表格数据。具体来说，我想提取第一列中的数据。当我运行以下代码时，第一列中的数据重复多次。如何获得仅在表格中显示一次的值？

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page3.html').read()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'id':'giftList'})

rows = table.find_all('tr')

for row in rows:
    data = row.find_all('td')
    for cell in data:
        print(data[0].text)

最佳答案

尝试这个：

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page3.html').read()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'id':'giftList'})

rows = table.find_all('tr')

for row in rows:
    data = row.find_all('td')

    if (len(data) > 0):
        cell = data[0]
        print(cell.text)

Table

python - BeautifulSoup :爬取表数据