我正在尝试从包含多个表的this URL刮取内容。所需的输出将是:
NAME FG% FT% 3PM REB AST STL BLK TO PTS SCORE
Team Jackson (0-8) .4313 .7500 21 71 34 11 12 15 189 1-8-0
Team Keyrouze (4-4) .4441 .8090 31 130 71 18 13 45 373 8-1-0
Nutz Vs. Draymond Green (4-4) .4292 .8769 30 86 66 15 9 28 269 3-6-0
Team Pauls 2 da Wall (3-5) .4784 .8438 40 123 64 18 20 30 316 6-3-0
Team Noey (2-6) .4350 .7679 21 125 62 20 9 33 278 7-2-0
YOU REACH, I TEACH (2-5-1) .4810 .7432 20 114 56 30 7 50 277 2-7-0
Kris Kaman His Pants (5-3) .4328 .8000 20 74 59 20 5 27 238 3-6-0
Duke's Balls In Daniels Face (3-4-1) .5000 .7045 42 139 38 27 22 30 303 6-3-0
Knicks Tape (5-3) .5000 .8152 34 143 92 12 9 47 397 4-5-0
Suck MyDirk (5-3) .4734 .8814 29 106 86 22 17 40 435 5-4-0
In Porzingod We Trust (4-4) .4928 .7222 27 180 95 16 16 46 423 7-2-0
Team Aguilar (6-1-1) .4718 .7053 28 177 65 12 35 48 413 2-7-0
Team Li (7-0-1) .4714 .8118 35 134 74 17 17 47 368 6-3-0
Team Iannetta (4-4) .4527 .7302 22 125 90 20 13 44 288 3-6-0
如果用这种方式格式化表格太困难了,我想知道如何抓取所有表格?我刮所有行的代码是这样的:
tableStats = soup.find('table', {'class': 'tableBody'})
rows = tableStats.findAll('tr')
for row in rows:
print(row.string)
但是它只显示值“ TEAM”,什么也没有显示...为什么它不包含表中的所有行?
谢谢。
最佳答案
代替寻找table
标记,您应该直接寻找具有更高可靠性的class
(例如linescoreTeamRow
)的行。该代码段可以解决问题,
from bs4 import BeautifulSoup
import requests
a = requests.get("http://games.espn.com/fba/scoreboard?leagueId=224165&seasonId=2017")
soup = BeautifulSoup(a.text, 'lxml')
# searching for the rows directly
rows = soup.findAll('tr', {'class': 'linescoreTeamRow'})
# you will need to isolate elements in the row for the table
for row in rows:
print row.text