我正试图刮擦这个NBA网站https://stats.nba.com/team/1610612738/
。我想做的是为每个玩家提取玩家的姓名,NO,POS和所有信息。问题是我找不到或我的代码找不到表所在的<div ng-view>
的父级<nba-stat-table >
。
到目前为止,我的代码是:
from selenium import webdriver
from bs4 import BeautifulSoup
def get_Player():
driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')
url = 'https://stats.nba.com/team/1610612738/'
driver.get(url)
data = driver.page_source.encode('utf-8')
soup = BeautifulSoup(data, 'lxml')
div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
print(div1.find_all('div'))
get_Player()
最佳答案
使用页面用于获取该内容的json响应端点。更容易,更轻松地处理,并且不需要硒。您可以在“网络”标签中找到它。
import requests
import pandas as pd
r = requests.get('https://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2018-19&TeamID=1610612738', headers = {'User-Agent' : 'Mozilla/5.0'}).json()
players_info = r['resultSets'][0]
df = pd.DataFrame(players_info['rowSet'], columns = players_info['headers'])
print(df.head())