当我运行以下代码时,我得到列表索引超出范围的消息:
import requests
from lxml.html import fromstring
def get_values():
print('executing get_values...')
url = 'https://sports.yahoo.com/nba/stats/weekly/?sortStatId=POINTS_PER_GAME&selectedTable=0'
response = requests.get(url)
parser = fromstring(response.text)
for i in parser.xpath('//tbody/tr')[:100]:
**FGM = i.xpath('.//td[4]/span/text()')[0] #This runs with no error even though its has similar xpath.**
print('FGM: ' + FGM)
G = i.xpath('.//td[2]/span/text()')[0]
print(G)
values = get_values()
当我运行代码时,出现以下错误消息:
G=i.xpath('/./td[2]/span/text()')[0]
IndexError: list index out of range
我试图使用以下语句进行调试。
print(parser.xpath('//tbody/tr/td[2]/span/text()')) #Returns list['4', '4', '3', '3', '3', '4', '4', '3', '2', '4', '3']
print(parser.xpath('//tbody/tr/td[2]/span/text()')[0]) #Returns value = 4
print(len(parser.xpath('//tbody/tr/td[2]/span/text()')[0])) # Returns value = 1
输出显示了预期值,因此我不确定其不起作用的原因。任何帮助,将不胜感激!
最佳答案
之所以失败,是因为第二个<span>
中并不总是有一个<td>
。这应该工作:
def get_values():
print('executing get_values...')
url = 'https://sports.yahoo.com/nba/stats/weekly/?sortStatId=POINTS_PER_GAME&selectedTable=0'
response = requests.get(url)
parser = fromstring(response.text)
for i in parser.xpath('//tbody/tr')[:100]:
FGM = i.xpath('.//td[4]/span/text()')[0] #This runs with no error even though its has similar xpath.**
print('FGM: ' + FGM)
G = i.xpath('.//td[2]/text()|.//td[2]/span/text()')[0] # <--- Changed this
print(G)
values = get_values()