问题描述
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
url = "https://www.google.com/finance/historical?cid=207437&startdate=Jan%201%2C%201971&enddate=Jul%201%2C%202017&start={0}&num=30"
how_many_pages=3
start=0
for i in range(how_many_pages):
new_url = url.format(start)
page = requests.get(new_url)
soup = BeautifulSoup(page.content, "lxml")
table = soup.find_all('table', class_='gf-table historical_price')[0]
columns_header = [th.getText() for th in table.findAll('tr')[0].findAll('th')]
data_rows=table.findAll('tr')[1:]
data=[[td.getText() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))]
if start == 0:
final_df = pd.DataFrame(data, columns=columns_header)
else:
df = pd.DataFrame(data, columns=columns_header)
final_df = pd.concat([final_df, df],axis=0)
start += 30
final_df.to_csv('nse_data.csv', sep='\t', encoding='utf-8')
final_df.columns = ['Date']
final_df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d', utc=True)
df.plot(x='Date', y='Close')
plt.savefig('foo.png')
下载的数据采用以下格式
The data downloaded is in the following format
"Date
" "Open
" "High
" "Low
" "Close
" "Volume
"
0 "Jun 30, 2017
" "9,478.50
" "9,535.80
" "9,448.75
" "9,520.90
" "-
"
1 "Jun 29, 2017
" "9,522.95
" "9,575.80
" "9,493.80
" "9,504.10
" "-
目前,我只想绘制Date
(在X轴上)对Close
(在Y轴上)
For the time being I only want to plot Date
(on X-axis) against Close
(on Y-axis)
但是我遇到错误
ValueError: Length mismatch: Expected axis has 6 elements, new values have 1 elements
推荐答案
-
您的标题和数据包含换行符.
print(final_df.columns)
返回:Index(['Date\n', 'Open\n', 'High\n', 'Low\n', 'Close\n', 'Volume\n'], dtype='object')
使用
rstrip
摆脱它们:columns_header = [th.getText().rstrip() for th in table.findAll('tr')[0].findAll('th')]
和
data = [[td.getText().rstrip() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))]
-
final_df.columns = ['Date']
会产生您的错误.一个数据框需要的标题与它的列数一样多.因此,在您的情况下,期望包含6个元素的列表.我不确定您要在这里做什么,我想您只需删除此行即可. final_df.columns = ['Date']
produces your error. A dataframe requires as many headers as its number of columns. Therefore, in your case a list of 6 elements is expected. I'm not sure what you want to do here, I think you can simply remove this line.您为日期解析指定的格式与您的数据
['Apr 4, 2017', 'Apr 5, 2017', 'Apr 6, 2017',...]
不匹配. 此处的格式代码上的文档.改用:The format you specify for date parsing does not match your data
['Apr 4, 2017', 'Apr 5, 2017', 'Apr 6, 2017',...]
. Documentation on format codes here. Use instead:final_df['Date'] = pd.to_datetime(df['Date'], format='%b %d, %Y')
-
将数据转换为数值,以便绘制它们:
Convert your data to numeric values so you can plot them:
final_df['Close'] = [float(val.replace(',', '')) for val in final_df['Close']]
-
最后您可以致电:
Finally you can call:
final_df.plot(x='Date', y='Close')
这篇关于ValueError:长度不匹配:预期轴有6个元素,新值有1个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!