本文介绍了无法从雅虎财经的表格中刮取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从雅虎财经中抓取数据,但我只能从此链接的统计页面上的某些表格中获取数据https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL.我能够从顶部表和左侧表中获取数据,但我无法弄清楚为什么以下程序不会从具有 Beta(每月 5 年)、52 周更改、上次拆分因子等值的右侧表中抓取和上次分割日期

I am trying to scrape data from yahoo finance, but I am only able to get data from certain tables on the statistics page at this link https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL. I am able to get data from the top table and the left tables, but I can't figure out why the following program won't scrape from the right tables with values like Beta (5Y Monthly), 52 Week Change,Last Split Factor and Last Split Date

stockStatDict = {}

stockSymbol = 'AAPL'
URL = 'https://finance.yahoo.com/quote/'+ stockSymbol + '/key-statistics?p=' + stockSymbol
page = requests.get(URL, headers=headers, timeout=5)


soup = BeautifulSoup(page.content, 'html.parser')

# Find all tables on the page
stock_data = soup.find_all('table')

# stock_data will contain multiple tables, next we examine each table one by one
for table in stock_data:

    # Scrape all table rows into variable trs
    trs = table.find_all('tr')


    for tr in trs:
        print('tr: ', tr)
        print()
        # Scrape all table data tags into variable tds
        tds = tr.find_all('td')
        print('tds: ', tds)
        print()
        print()

        if len(tds) > 0:
            # Index 0 of tds will contain the measurement
            # Index 1 of tds will contain the value
            # Insert measurement and value into stockDict
            stockStatDict[tds[0].get_text()] = [tds[1].get_text()]

stock_stat_df = pd.DataFrame(data=stockStatDict)
print(stock_stat_df.head())
print(stock_stat_df.info())

知道为什么这段代码不检索这些字段和值吗?

Any idea why this code isn't retrieving those fields and values?

推荐答案

要从 Yahoo 服务器获得正确的响应,请设置 User-Agent HTTP 标头:

To get correct response from the Yahoo server, set User-Agent HTTP header:

import requests
from bs4 import BeautifulSoup


url = "https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

for t in soup.select("table"):
    for tr in t.select("tr:has(td)"):
        for sup in tr.select("sup"):
            sup.extract()
        tds = [td.get_text(strip=True) for td in tr.select("td")]
        if len(tds) == 2:
            print("{:<50} {}".format(*tds))

打印:

Market Cap (intraday)                              2.34T
Enterprise Value                                   2.36T
Trailing P/E                                       31.46
Forward P/E                                        26.16
PEG Ratio (5 yr expected)                          1.51
Price/Sales(ttm)                                   7.18
Price/Book(mrq)                                    33.76
Enterprise Value/Revenue                           7.24
Enterprise Value/EBITDA                            23.60
Beta (5Y Monthly)                                  1.21
52-Week Change                                     50.22%
S&P500 52-Week Change                              38.38%
52 Week High                                       145.09
52 Week Low                                        89.14
50-Day Moving Average                              129.28
200-Day Moving Average                             129.32
Avg Vol (3 month)                                  82.16M
Avg Vol (10 day)                                   64.25M
Shares Outstanding                                 16.69B
Implied Shares Outstanding                         N/A
Float                                              16.67B
% Held by Insiders                                 0.07%
% Held by Institutions                             58.54%
Shares Short (Jun 14, 2021)                        108.94M
Short Ratio (Jun 14, 2021)                         1.52
Short % of Float (Jun 14, 2021)                    0.65%
Short % of Shares Outstanding (Jun 14, 2021)       0.65%
Shares Short (prior month May 13, 2021)            94.75M
Forward Annual Dividend Rate                       0.88
Forward Annual Dividend Yield                      0.64%
Trailing Annual Dividend Rate                      0.82
Trailing Annual Dividend Yield                     0.60%
5 Year Average Dividend Yield                      1.32
Payout Ratio                                       18.34%
Dividend Date                                      May 12, 2021
Ex-Dividend Date                                   May 06, 2021
Last Split Factor                                  4:1
Last Split Date                                    Aug 30, 2020
Fiscal Year Ends                                   Sep 25, 2020
Most Recent Quarter(mrq)                           Mar 26, 2021
Profit Margin                                      23.45%
Operating Margin(ttm)                              27.32%
Return on Assets(ttm)                              16.90%
Return on Equity(ttm)                              103.40%
Revenue(ttm)                                       325.41B
Revenue Per Share(ttm)                             19.14
Quarterly Revenue Growth(yoy)                      53.60%
Gross Profit(ttm)                                  104.96B
EBITDA                                             99.82B
Net Income Avi to Common(ttm)                      76.31B
Diluted EPS(ttm)                                   4.45
Quarterly Earnings Growth(yoy)                     110.10%
Total Cash(mrq)                                    69.83B
Total Cash Per Share(mrq)                          4.18
Total Debt(mrq)                                    134.74B
Total Debt/Equity(mrq)                             194.78
Current Ratio(mrq)                                 1.14
Book Value Per Share(mrq)                          4.15
Operating Cash Flow(ttm)                           99.59B
Levered Free Cash Flow(ttm)                        80.12B

这篇关于无法从雅虎财经的表格中刮取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 12:30