问题描述
我从发现了Greg Reda的博客文章::
我试图使用他写的代码有:
导入请求
导入json
url ='http:// stats .nba.com / stats / leaguedashteamshotlocations?Conference =& DateFr'+ \
'om =& DateTo =& DistanceRange = By + Zone& Division =& GameScope =&GameSegment =& LastN '+ \
'游戏= 0&联盟ID = 00和位置=&测量类型=对手和月= 0和对手技术ID'+ \
'= 0&结果=& PORound = 0& PaceAdjust = N& PerMode = PerGame& Period = 0& PlayerExperien'+ \
'ce =& PlayerPosition =& PlusMinus = N& Rank = N& Season = 2014-15& SeasonSegment = & Seas'+ \
'onType = Regular + Season& ShotClockRange =& StarterBench =& TeamID = 0& VsConference =& VsDivision ='
response = requests.get (url)
response.raise_for_status()
shots = response.json()['resultSets'] ['rowSet']
avg_percentage = shots ['OPP_FG_PCT']
print(avg_percentage)
但它会返回:
Traceback(最近一次调用最后一次):
文件C:\Python34\\\
ba.py,第91行,位于<模块>
avg_percentage = shots ['OPP_FG_PCT']
TypeError:列表索引必须是整数,而不是str
我只知道基本的Python,因此我无法弄清楚如何从数据中获取整数列表。有人可以解释吗?
很明显,自从Greg Reda写这篇文章后,数据结构发生了变化。在浏览数据之前,我建议您通过酸洗将其保存到文件中。这样,你不必一直保持击中NBA服务器,并在每次修改和重新运行脚本时等待下载。
以下脚本检查是否存在腌制的数据,以避免不必要的下载:
导入请求
导入json
url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr'+ \
'om =& DateTo =& DistanceRange = By + Zone& Division =& GameScope =& amp ; GameSegment =& LastN'+ \
'Games = 0& LeagueID = 00& Location =& MeasureType = Opponent& Month = 0& OpponentTeamID'+ \
'= 0& Outcome = & PORound = 0& PaceAdjust = N& PerMode = PerGame& Period = 0& PlayerExperien'+ \
'ce =& PlayerPosition =&PlusMinus = N& Rank = N& Season = 2014-15& SeasonSegment =& Seas'+ \
'onType = Regular + Season& ShotClockRange =& StarterBench =& TeamID = 0& V sConference =& VsDivision ='
print(url)
import sys,os,pickle
file_name ='result_sets.pickled'
if os .path.isfile(file_name):
result_sets = pickle.load(open(file_name,'rb'))
else:
response = requests.get(url)
response .raise_for_status()
result_sets = response.json()['resultSets']
pickle.dump(result_sets,open(file_name,'wb'))
print(result_sets print(result_sets ['rowSet'] [0])
print(len(result_sets ['rowSet' ]))
一旦您有 result_sets
手,你可以检查数据。如果你打印它,你会看到它是一本字典。您可以提取字典键:
print(result_sets.keys())
目前键是'headers'
,'rowSet' code>和
'name'
。您可以检查标题:
print(result_sets ['headers'])
我对这些统计数据的了解可能比您少。但是,通过查看数据,我可以发现 result_sets ['rowSet']
包含30行,每行23个元素。 23列由 result_sets ['headers'] [1]
标识。试试这个:
$ p $ print(result_sets ['headers'] [1])$ b $ b
这将显示23列名称。现在看看第一行的团队数据:
print(result_sets ['rowSet'] [0])
现在您会看到为亚特兰大老鹰队报告的23个值。您可以迭代 result_sets ['rowSet']
中的行,以提取您感兴趣的任何值并计算汇总信息,例如总计和平均值。
I found Greg Reda's blog post about scraping HTML from nba.com:
http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
I tried to work with the code he wrote there:
import requests
import json
url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr' + \
'om=&DateTo=&DistanceRange=By+Zone&Division=&GameScope=&GameSegment=&LastN' + \
'Games=0&LeagueID=00&Location=&MeasureType=Opponent&Month=0&OpponentTeamID' + \
'=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperien' + \
'ce=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2014-15&SeasonSegment=&Seas' + \
'onType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision='
response = requests.get(url)
response.raise_for_status()
shots = response.json()['resultSets']['rowSet']
avg_percentage = shots['OPP_FG_PCT']
print(avg_percentage)
But it returns:
Traceback (most recent call last):
File "C:\Python34\nba.py", line 91, in <module>
avg_percentage = shots['OPP_FG_PCT']
TypeError: list indices must be integers, not str
I know only basic Python so I couldn't figure out how to get a list of integers from the data. Can anybody explain?
Evidently the data structure has changed since Greg Reda wrote that post. Before exploring the data, I recommend that you save it to a file via pickling. That way you don't have to keep hitting the NBA server and waiting for a download each time you modify and rerun the script.
The following script checks for the existence of the pickled data to avoid unnecessary downloading:
import requests
import json
url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr' + \
'om=&DateTo=&DistanceRange=By+Zone&Division=&GameScope=&GameSegment=&LastN' + \
'Games=0&LeagueID=00&Location=&MeasureType=Opponent&Month=0&OpponentTeamID' + \
'=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperien' + \
'ce=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2014-15&SeasonSegment=&Seas' + \
'onType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision='
print(url)
import sys, os, pickle
file_name = 'result_sets.pickled'
if os.path.isfile(file_name):
result_sets = pickle.load(open(file_name, 'rb'))
else:
response = requests.get(url)
response.raise_for_status()
result_sets = response.json()['resultSets']
pickle.dump(result_sets, open(file_name, 'wb'))
print(result_sets.keys())
print(result_sets['headers'][1])
print(result_sets['rowSet'][0])
print(len(result_sets['rowSet']))
Once you have result_sets
in hand, you can examine the data. If you print it, you'll see that it's a dictionary. You can extract the dictionary keys:
print(result_sets.keys())
Currently the keys are 'headers'
, 'rowSet'
, and 'name'
. You can inspect the headers:
print(result_sets['headers'])
I probably know less about these statistics than you do. However, by looking at the data, I've been able to figure out that result_sets['rowSet']
contains 30 rows of 23 elements each. The 23 columns are identified by result_sets['headers'][1]
. Try this:
print(result_sets['headers'][1])
That will show you the 23 column names. Now take a look at the first row of team data:
print(result_sets['rowSet'][0])
Now you see the 23 values reported for the Atlanta Hawks. You can iterate over the rows in result_sets['rowSet']
to extract whatever values interest you and to compute aggregate information such as totals and averages.
这篇关于如何使用来自NBA.com的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!