我试图在具有代码“ SEVNYXX”的列下保存数据,其中“ XX”是使用Python在site上紧随其后的数字(例如01、02等)。

通过下面的代码,我可以获得所需的所有“列”数据的第一行。但是,有没有一种方法可以在其中包含标题和行标题?

我知道我有标题,但我想知道是否有办法将这些标题包含在输出的数据中?
而且,我又如何看待包括所有行?

from bs4 import BeautifulSoup
from urllib import request

page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)

desired_table = soup.findAll('table')[2]

# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
    if 'SVENY' in th.string:
        desired_columns.append(headers.index(th))

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    cells= row.findAll('td')
    for column in desired_columns:
        print(cells[column].text)

最佳答案

这个怎么样?

我添加了th.getText()并在所需的列上创建了一个列表,该列表拉出了列名,然后添加了row_name = row.findNext('th').getText()以获得该行。

from bs4 import BeautifulSoup
from urllib import request

page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)

desired_table = soup.findAll('table')[2]

# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
    if 'SVENY' in th.string:
        desired_columns.append([headers.index(th), th.getText()])

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    cells = row.findAll('td')
    row_name = row.findNext('th').getText()
    for column in desired_columns:
        print(cells[column[0]].text, row_name, column[1])

关于python - 使用BeautifulSoup进行爬取:要爬取整个列,包括标题行和标题行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/30741576/

10-12 19:27