我正在尝试从以下地址进行网页抓取:https://www.pro-football-reference.com/boxscores/

这是美式足球比赛得分的页面。我想知道每场比赛的日期,赢家和输家。我可以很方便地获取日期,但是无法弄清楚如何为获胜者和失败者隔离并获取球队名称。
我到目前为止所拥有的...

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


#assigning url
my_url = 'https://www.pro-football-reference.com/boxscores/'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html,"html.parser")

games = page_soup.findAll("div",{"class":"game_summary expanded nohover"})


for game in games:
    date_block = game.findAll("tr",{"class":"date"})
    date_val = date_block[0].text
    winner_block = game.findAll("tr",{"class":"winner"})
    #here I need a line that returns the game winner, e.g. "Philadelphia Eagles"
    loser = game.findAll("tr",{"class":"loser"})


这是相关的HTML ...

<div class="game_summary expanded nohover">
<table class="teams">
    <tbody>
        <tr class="date">
            <td colspan="3">Sep 6, 2018</td>
        </tr>
        <tr class="loser">
            <td><a href="/teams/atl/2018.htm">Atlanta Falcons</a></td>
            <td class="right">12</td>
            <td class="right gamelink">
                <a href="/boxscores/201809060phi.htm">Final</a>
            </td>
        </tr>
        <tr class="winner">
            <td><a href="/teams/phi/2018.htm">Philadelphia Eagles</a></td>
            <td class="right">18</td>
            <td class="right">
            </td>
        </tr>
    </tbody>
</table>
<table class="stats">
    <tbody>
        <tr>
            <td><strong>PassYds</strong></td>
            <td><a href="/players/R/RyanMa00.htm" title="Matt Ryan">Ryan</a>-ATL</td>
            <td class="right">251</td>
        </tr>
        <tr>
            <td><strong>RushYds</strong></td>
            <td><a href="/players/A/AjayJa00.htm" title="Jay Ajayi">Ajayi</a>-PHI</td>
            <td class="right">62</td>
        </tr>
        <tr>
            <td><strong>RecYds</strong></td>
            <td><a href="/players/J/JoneJu02.htm" title="Julio Jones">Jones</a>-ATL</td>
            <td class="right">169</td>
        </tr>
    </tbody>
</table>




我收到一条错误消息,说ResultSet对象没有属性“ td”。任何帮助将不胜感激

最佳答案

对于平局游戏要小心,我认为这是导致您出错的原因,因为在这种情况下没有赢家,因此您不会在赢家类别中找到任何人。以下代码输出日期和获胜者。

for game in games:
    date_block = game.find('tr',{'class':'date'})
    date_val = date_block.text
    winner_block = game.find('tr',{'class':'winner'})
    if winner_block:
        winner = winner_block.find('a').text
        print(date_val)
        print(winner)
    loser = game.findAll('tr',{'class':'loser'})


输出:

Sep 6, 2018
Philadelphia Eagles
Sep 9, 2018
New England Patriots
Sep 9, 2018
Tampa Bay Buccaneers
Sep 9, 2018
Minnesota Vikings
Sep 9, 2018
Miami Dolphins
Sep 9, 2018
Cincinnati Bengals
Sep 9, 2018
Baltimore Ravens
Sep 9, 2018
Jacksonville Jaguars
Sep 9, 2018
Kansas City Chiefs
Sep 9, 2018
Denver Broncos
Sep 9, 2018
Washington Redskins
Sep 9, 2018
Carolina Panthers
Sep 9, 2018
Green Bay Packers
Sep 10, 2018
New York Jets
Sep 10, 2018
Los Angeles Rams

关于python - 使用python进行网页抓取。无法访问td元素,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52285755/

10-13 07:21