我试图在具有代码“ SEVNYXX”的列下保存数据,其中“ XX”是使用Python在site上紧随其后的数字(例如01、02等)。
通过下面的代码,我可以获得所需的所有“列”数据的第一行。但是,有没有一种方法可以在其中包含标题和行标题?
我知道我有标题,但我想知道是否有办法将这些标题包含在输出的数据中?
而且,我又如何看待包括所有行?
from bs4 import BeautifulSoup
from urllib import request
page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)
desired_table = soup.findAll('table')[2]
# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
if 'SVENY' in th.string:
desired_columns.append(headers.index(th))
# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')
for row in rows[1:]:
cells= row.findAll('td')
for column in desired_columns:
print(cells[column].text)
最佳答案
这个怎么样?
我添加了th.getText()
并在所需的列上创建了一个列表,该列表拉出了列名,然后添加了row_name = row.findNext('th').getText()
以获得该行。
from bs4 import BeautifulSoup
from urllib import request
page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)
desired_table = soup.findAll('table')[2]
# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
if 'SVENY' in th.string:
desired_columns.append([headers.index(th), th.getText()])
# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')
for row in rows[1:]:
cells = row.findAll('td')
row_name = row.findNext('th').getText()
for column in desired_columns:
print(cells[column[0]].text, row_name, column[1])
关于python - 使用BeautifulSoup进行爬取:要爬取整个列,包括标题行和标题行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/30741576/