问题描述
我正在尝试从雅虎财经中抓取数据,但我只能从此链接的统计页面上的某些表格中获取数据https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL.我能够从顶部表和左侧表中获取数据,但我无法弄清楚为什么以下程序不会从具有 Beta(每月 5 年)、52 周更改、上次拆分因子等值的右侧表中抓取和上次分割日期
I am trying to scrape data from yahoo finance, but I am only able to get data from certain tables on the statistics page at this link https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL. I am able to get data from the top table and the left tables, but I can't figure out why the following program won't scrape from the right tables with values like Beta (5Y Monthly), 52 Week Change,Last Split Factor and Last Split Date
stockStatDict = {}
stockSymbol = 'AAPL'
URL = 'https://finance.yahoo.com/quote/'+ stockSymbol + '/key-statistics?p=' + stockSymbol
page = requests.get(URL, headers=headers, timeout=5)
soup = BeautifulSoup(page.content, 'html.parser')
# Find all tables on the page
stock_data = soup.find_all('table')
# stock_data will contain multiple tables, next we examine each table one by one
for table in stock_data:
# Scrape all table rows into variable trs
trs = table.find_all('tr')
for tr in trs:
print('tr: ', tr)
print()
# Scrape all table data tags into variable tds
tds = tr.find_all('td')
print('tds: ', tds)
print()
print()
if len(tds) > 0:
# Index 0 of tds will contain the measurement
# Index 1 of tds will contain the value
# Insert measurement and value into stockDict
stockStatDict[tds[0].get_text()] = [tds[1].get_text()]
stock_stat_df = pd.DataFrame(data=stockStatDict)
print(stock_stat_df.head())
print(stock_stat_df.info())
知道为什么这段代码不检索这些字段和值吗?
Any idea why this code isn't retrieving those fields and values?
推荐答案
要从 Yahoo 服务器获得正确的响应,请设置 User-Agent
HTTP 标头:
To get correct response from the Yahoo server, set User-Agent
HTTP header:
import requests
from bs4 import BeautifulSoup
url = "https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for t in soup.select("table"):
for tr in t.select("tr:has(td)"):
for sup in tr.select("sup"):
sup.extract()
tds = [td.get_text(strip=True) for td in tr.select("td")]
if len(tds) == 2:
print("{:<50} {}".format(*tds))
打印:
Market Cap (intraday) 2.34T
Enterprise Value 2.36T
Trailing P/E 31.46
Forward P/E 26.16
PEG Ratio (5 yr expected) 1.51
Price/Sales(ttm) 7.18
Price/Book(mrq) 33.76
Enterprise Value/Revenue 7.24
Enterprise Value/EBITDA 23.60
Beta (5Y Monthly) 1.21
52-Week Change 50.22%
S&P500 52-Week Change 38.38%
52 Week High 145.09
52 Week Low 89.14
50-Day Moving Average 129.28
200-Day Moving Average 129.32
Avg Vol (3 month) 82.16M
Avg Vol (10 day) 64.25M
Shares Outstanding 16.69B
Implied Shares Outstanding N/A
Float 16.67B
% Held by Insiders 0.07%
% Held by Institutions 58.54%
Shares Short (Jun 14, 2021) 108.94M
Short Ratio (Jun 14, 2021) 1.52
Short % of Float (Jun 14, 2021) 0.65%
Short % of Shares Outstanding (Jun 14, 2021) 0.65%
Shares Short (prior month May 13, 2021) 94.75M
Forward Annual Dividend Rate 0.88
Forward Annual Dividend Yield 0.64%
Trailing Annual Dividend Rate 0.82
Trailing Annual Dividend Yield 0.60%
5 Year Average Dividend Yield 1.32
Payout Ratio 18.34%
Dividend Date May 12, 2021
Ex-Dividend Date May 06, 2021
Last Split Factor 4:1
Last Split Date Aug 30, 2020
Fiscal Year Ends Sep 25, 2020
Most Recent Quarter(mrq) Mar 26, 2021
Profit Margin 23.45%
Operating Margin(ttm) 27.32%
Return on Assets(ttm) 16.90%
Return on Equity(ttm) 103.40%
Revenue(ttm) 325.41B
Revenue Per Share(ttm) 19.14
Quarterly Revenue Growth(yoy) 53.60%
Gross Profit(ttm) 104.96B
EBITDA 99.82B
Net Income Avi to Common(ttm) 76.31B
Diluted EPS(ttm) 4.45
Quarterly Earnings Growth(yoy) 110.10%
Total Cash(mrq) 69.83B
Total Cash Per Share(mrq) 4.18
Total Debt(mrq) 134.74B
Total Debt/Equity(mrq) 194.78
Current Ratio(mrq) 1.14
Book Value Per Share(mrq) 4.15
Operating Cash Flow(ttm) 99.59B
Levered Free Cash Flow(ttm) 80.12B
这篇关于无法从雅虎财经的表格中刮取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!