问题描述
我正在关注PythonProgramming.net上的金融教程,当我尝试将多个数据帧组合为一个大数据帧时遇到了一个问题.我创建了一个函数来做到这一点:
I'm following the finance tutorials on PythonProgramming.net and have run into an issue when I try to combine several dataframes into one large dataframe. I created a function to do this:
def compile_data():
with open ("sp500tickers.pickle", "rb") as f:
tickers = pickle.load(f)
main_df = pd.DataFrame()
for count,ticker in enumerate(tickers):
try:
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
df.set_index('Date', inplace=True)
df.rename(columns={'Close':ticker}, inplace=True)
df.drop(['Open','High','Low','Volume'], 1, inplace=True)
if main_df.empty:
main_df = df
else:
main_df.join(df, how='outer')
print(main_df.head())
if count % 10 == 0:
print(count)
except Exception:
pass
print(main_df.head())
main_df.to_csv('sp500joinedcloses.csv')
(我在上面的代码中使用try/except的原因是因为我拥有S& P500的所有代码的列表,但无法从Google Finance API中获取所有代码的数据..这样,如果它试图找到我没有的csv,它将仍然合并我拥有的csv,而不会抛出错误.)
(The reason I used the try/except in the above code was because I have a list of all the tickers for the S&P500, but wasn't able to grab data from Google Finance API for all of them... so this way, if it tries to find a csv that I don't have, it will still combine the ones I do have without throwing an error.)
当我运行此函数时,它将创建一个名为sp500joinedcloses.csv的CSV,但其中仅包含一个股票代码(即ABBV)的数据.我知道它正确地遍历了代码,因为如果在for循环中添加了print(ticker),则会打印所有正确的代码.
When I run this function, it creates a CSV called sp500joinedcloses.csv, but it only contains the data for one ticker, namely ABBV. I know it is iterating through the tickers properly, because if I add a print(ticker) in the for loop, all the correct tickers are printed.
还值得注意的是,ABBV并不是我应该包含在数据帧中的第一个csv.他们首先应该拥有文件的股票代码是AAPL,然后是ABBV.不知道为什么它似乎跳过了APL.
It's also worth noting that ABBV isn't the first csv that I have that should be included in the dataframe. They first ticker that should have a file is AAPL, and then ABBV. No idea why it seems to skip AAPL.
我将不胜感激.我是熊猫的初学者,真的很想学习关于它的一切.
I would appreciate any help. I am a beginner to pandas, and really want to learn everything I can about it.
推荐答案
IIUC:
您不想使用 join
,并且如果从一个空的数据帧开始,您将永远无法使用.使用 pd.concat
代替:
You don't want to use join
and you couldn't anyway if you start with an empty dataframe. Use pd.concat
instead:
main_df = pd.concat([main_df, df], axis=1)
但是,我建议您用它来代替您的整个过程:
However, I would recommend this to replace your whole process:
def read_file(ticker):
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)).set_index('Date')
return df.Close.rename(ticker)
with open ("sp500tickers.pickle", "rb") as f:
tickers = pickle.load(f)
main_df = pd.concat([read_file(t) for t in tickers], axis=1)
这篇关于Pandas .join无法工作以合并S& P500股票数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!