我正试图刮擦这个NBA网站https://stats.nba.com/team/1610612738/。我想做的是为每个玩家提取玩家的姓名,NO,POS和所有信息。问题是我找不到或我的代码找不到表所在的<div ng-view>的父级<nba-stat-table >

到目前为止,我的代码是:

from selenium import webdriver
from bs4 import BeautifulSoup

def get_Player():
    driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')

    url = 'https://stats.nba.com/team/1610612738/'

    driver.get(url)

    data = driver.page_source.encode('utf-8')

    soup = BeautifulSoup(data, 'lxml')

    div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
    print(div1.find_all('div'))

get_Player()

最佳答案

使用页面用于获取该内容的json响应端点。更容易,更轻松地处理,并且不需要硒。您可以在“网络”标签中找到它。

import requests
import pandas as pd

r = requests.get('https://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2018-19&TeamID=1610612738',  headers = {'User-Agent' : 'Mozilla/5.0'}).json()
players_info = r['resultSets'][0]
df = pd.DataFrame(players_info['rowSet'], columns = players_info['headers'])
print(df.head())


python - 无法使用BeautifulSoup Python在NBA Stats网站上找到&lt;div ng-view&gt;-LMLPHP

10-05 23:26