我正在使用NBA网站上的BeautifulSoup抓取数据。要创建一个包含名称,玩家生物链接,身高,体重,DOB的列表。
名称和Player Bio-Link已成功删除,而其他人则没有。
链接:https://in.global.nba.com/playerindex/。
另外,我注意到每次尝试访问变量资源管理器中的变量时,我的spyder内核都会死亡。
names = []
tr = soup.find_all("tr",class_="ng-scope")
for i in tr:
td = i.find("td",class_="left player")
anchor = td.find("a",class_="player-name ng-isolate-scope")
href = td.find("a")["data-ng-href"]
span = anchor.find("span",class_="ng-binding")
spans = anchor.find("span",class_="ng-
binding").findNextSibling().findNextSibling()
name = span.text + " " + spans.text
linktoplayer = 'https://in.global.nba.com'+href
driver.get(linktoplayer)
html_docs = driver.page_source
soups = BeautifulSoup(html_docs,'lxml')
div = soups.find("div",class_="player-info-right hidden-sm")
p = div.find("p",class_="ng-binding")
upperspan = p.find("span",class_="ng-binding")
innerspan = upperspan.find("span",class_="ng-binding")
height = innerspan.text
print(height)
weight = innerspan.next_sibling.next_sibling.next_sibling
dob = upperspan.next_sibling.next_sibling.next_sibling
dob = dob.split(" ")[1]
bio ={
"name":name,
"href":href,
"height":height,
"weight":weight,
"dob":dob
}
names.append(bio)
最佳答案
请参阅浏览器网络选项卡中的网站,请求API获取JSON数据。
例如
import requests
jsonData = requests.get("https://in.global.nba.com/stats2/league/playerlist.json?locale=en").json()
for x in jsonData['payload']['players']:
#print player profile data
print(x['playerProfile'])
#print team profile data
print(x['teamProfile'])
O / P:
玩家资料
{'code': 'ivica_zubac', 'country': 'Croatia', 'countryEn': 'Croatia', 'displayAffiliation': 'Croatia', 'displayName': 'Ivica Zubac', 'displayNameEn': 'Ivica Zubac', 'dob': '858661200000', 'draftYear': '2016', 'experience': '3', 'firstInitial': 'I', 'firstName': 'Ivica', 'firstNameEn': 'Ivica', 'height': '7-1', 'jerseyNo': '40', 'lastName': 'Zubac', 'lastNameEn': 'Zubac', 'leagueId': '00', 'playerId': '1627826', 'position': 'C', 'schoolType': '', 'weight': '240 lbs'}
...
团队资料数据
{'abbr': 'LAC', 'city': 'LA', 'cityEn': 'LA', 'code': 'clippers', 'conference': 'Western', 'displayAbbr': 'LAC', 'displayConference': 'Western', 'division': 'Pacific', 'id': '1610612746', 'isAllStarTeam': False, 'isLeagueTeam': True, 'leagueId': '00', 'name': 'Clippers', 'nameEn': 'Clippers'}
....
将毫秒转换为日期:
例如
import datetime
ms = '858661200000'
dob = datetime.datetime.fromtimestamp(int(ms)/1000.0).date()
print(dob)
O / P:
1997-03-18
关于python - 每当我运行此代码时,Spider Kernel都会死亡,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57088484/