本文介绍了使用BeautifulSoup python 3.6抓取数据时缺少网页值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用下面的脚本从 http://fortune.com抓取库存报价"数据/fortune500/xcel-energy/,但空白.
I am using below script to scrap "STOCK QUOTE" data from http://fortune.com/fortune500/xcel-energy/, But its giving blank.
我也使用了硒驱动程序,但是同样的问题.请对此提供帮助.
I have used selenium driver also, but same issue. Please help on this.
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
r = requests.get('http://fortune.com/fortune500/xcel-energy/')
soup = bs(r.content, 'lxml') # tried: 'html.parser
data = pd.DataFrame(columns=['C1','C2','C3','C4'], dtype='object', index=range(0,11))
for table in soup.find_all('div', {'class': 'stock-quote row'}):
row_marker = 0
for row in table.find_all('li'):
column_marker = 0
columns = row.find_all('span')
for column in columns:
data.iat[row_marker, column_marker] = column.get_text()
column_marker += 1
row_marker += 1
print(data)
输出获取:
C1 C2 C3 C4
0 Previous Close: NaN NaN
1 Market Cap: NaNB NaN B
2 Next Earnings Date: NaN NaN
3 High: NaN NaN
4 Low: NaN NaN
5 52 Week High: NaN NaN
6 52 Week Low: NaN NaN
7 52 Week Change %: 0.00 NaN NaN
8 P/E Ratio: n/a NaN NaN
9 EPS: NaN NaN
10 Dividend Yield: n/a NaN NaN
推荐答案
看起来您正在寻找的数据可在此 API端点:
It looks like the data you are looking for is available at this API endpoint:
import requests
response = requests.get("http://fortune.com/api/v2/company/xel/expand/1")
data = response.json()
print(data['ticker'])
仅供参考,在使用自动硒化的浏览器打开页面时,您只需确保等待所需数据出现,然后解析HTML ,工作代码:
FYI, when opening the page in an selenium-automated browser, you just need to make sure you wait for the desired data to appear before parsing the HTML, working code:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
url = 'http://fortune.com/fortune500/xcel-energy/'
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".stock-quote")))
page_source = driver.page_source
driver.close()
# HTML parsing part
soup = BeautifulSoup(page_source, 'lxml') # tried: 'html.parser
data = pd.DataFrame(columns=['C1','C2','C3','C4'], dtype='object', index=range(0,11))
for table in soup.find_all('div', {'class': 'stock-quote'}):
row_marker = 0
for row in table.find_all('li'):
column_marker = 0
columns = row.find_all('span')
for column in columns:
data.iat[row_marker, column_marker] = column.get_text()
column_marker += 1
row_marker += 1
print(data)
这篇关于使用BeautifulSoup python 3.6抓取数据时缺少网页值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!