问题描述
我最近一直在使用 BeautifulSoup.我正在尝试从 https://www.pro 获取数据-football-reference.com/teams/mia/2000_roster.htm 网站.具体来说,我想要的是玩家姓名和gs"(游戏开始).
I've been working with BeautifulSoup lately. I'm trying to get the data from https://www.pro-football-reference.com/teams/mia/2000_roster.htm site. Specifically all I want is the player name and 'gs' (games started).
但是,在执行此操作时,它仅返回第一个('Starters')表数据.我实际上对那个顶级表根本不感兴趣,我想要名为名册"的第二个表.
However, when doing it, it's only returning the 1st ('Starters') table data. I'm actually not interested in that top table at all, I want the 2nd table titled 'Roster'.
这是我正在做的代码.就像我说的,除了玩家姓名和游戏开始之外,我真的不想要/需要任何东西,只是在练习和学习 BeautifulSoup.
Here's the code, that I was doing. Like I said, I didn't really want/need anything other than player name and games started, but was just practicing and learning BeautifulSoup.
import pandas as pd
import requests
import bs4
alpha = requests.get('https://www.pro-football-
reference.com/teams/mia/2000_roster.htm')
beta = bs4.BeautifulSoup(alpha.text,'lxml')
gama = beta.findAll('th',{'data-stat':'pos'})
position = [th.text for th in gama]
position = position[1:]
position = list(filter(None, position))
gama = beta.findAll('td',{'data-stat':'player'})
player = [td.text for td in gama]
player = player[1:]
while 'Defensive Starters' in player: player.remove('Defensive Starters')
while 'Special Teams Starters' in player: player.remove('Special Teams
Starters')
gama = beta.findAll('td',{'data-stat':'age'})
age = [td.text for td in gama]
age = list(filter(None, age))
gama = beta.findAll('td',{'data-stat':'gs'})
gs = [td.text for td in gama]
gs = list(filter(None, gs))
target = pd.DataFrame(
{
'player_name':player,
'position':position,
'gs':gs,
'age':age
})
有人知道我哪里出错了吗?或者也许是另一种方法?
Anyone see where I'm going wrong? Or maybe an alternative way to go about it?
推荐答案
要从该表中获取内容,您需要使用任何浏览器模拟器,因为该部分的响应是动态生成的.不过,无需任何浏览器模拟器即可轻松访问第一个表中的数据.我在这种情况下尝试了硒:
To get the content from that table you need to use any browser simulator cause the response of that portion is generated dynamically. Data from the first table can easily be accessible without any browser simulator, though. I tried selenium in this case:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
player = items.select("[data-stat='player']")[0].text
gs = items.select("[data-stat='gs']")[0].text
print(player,gs)
driver.quit()
部分输出:
Player GS
Trace Armstrong* 0
John Bock 1
Tim Bowens 15
Lorenzo Bromell 0
Autry Denson 0
Mark Dixon 15
Kevin Donnalley 16
由于某种原因,如果您遇到此类错误,这次也不会为该错误提供此类选项:
For some reason if you encounter such error, this time there will be no such option for that error either:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
player = items.select("[data-stat='player']")[0].text if items.select("[data-stat='player']") else ""
gs = items.select("[data-stat='gs']")[0].text if items.select("[data-stat='gs']") else ""
print(player,gs)
driver.quit()
这篇关于BeautifulSoup - 只返回第一桌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!