本文介绍了Pandas .join无法工作以合并S& P500股票数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注PythonProgramming.net上的金融教程,当我尝试将多个数据帧组合为一个大数据帧时遇到了一个问题.我创建了一个函数来做到这一点:

I'm following the finance tutorials on PythonProgramming.net and have run into an issue when I try to combine several dataframes into one large dataframe. I created a function to do this:

def compile_data():
    with open ("sp500tickers.pickle", "rb") as f:
        tickers = pickle.load(f)

    main_df = pd.DataFrame()

    for count,ticker in enumerate(tickers):
        try:
            df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
            df.set_index('Date', inplace=True)
            df.rename(columns={'Close':ticker}, inplace=True)
            df.drop(['Open','High','Low','Volume'], 1, inplace=True)
            if main_df.empty:
                main_df = df
            else:
                main_df.join(df, how='outer')
                print(main_df.head())
            if count % 10 == 0:
                print(count)
        except Exception:
            pass

    print(main_df.head())
    main_df.to_csv('sp500joinedcloses.csv')

(我在上面的代码中使用try/except的原因是因为我拥有S& P500的所有代码的列表,但无法从Google Finance API中获取所有代码的数据..这样,如果它试图找到我没有的csv,它将仍然合并我拥有的csv,而不会抛出错误.)

(The reason I used the try/except in the above code was because I have a list of all the tickers for the S&P500, but wasn't able to grab data from Google Finance API for all of them... so this way, if it tries to find a csv that I don't have, it will still combine the ones I do have without throwing an error.)

当我运行此函数时,它将创建一个名为sp500joinedcloses.csv的CSV,但其中仅包含一个股票代码(即ABBV)的数据.我知道它正确地遍历了代码,因为如果在for循环中添加了print(ticker),则会打印所有正确的代码.

When I run this function, it creates a CSV called sp500joinedcloses.csv, but it only contains the data for one ticker, namely ABBV. I know it is iterating through the tickers properly, because if I add a print(ticker) in the for loop, all the correct tickers are printed.

还值得注意的是,ABBV并不是我应该包含在数据帧中的第一个csv.他们首先应该拥有文件的股票代码是AAPL,然后是ABBV.不知道为什么它似乎跳过了APL.

It's also worth noting that ABBV isn't the first csv that I have that should be included in the dataframe. They first ticker that should have a file is AAPL, and then ABBV. No idea why it seems to skip AAPL.

我将不胜感激.我是熊猫的初学者,真的很想学习关于它的一切.

I would appreciate any help. I am a beginner to pandas, and really want to learn everything I can about it.

推荐答案

IIUC:

您不想使用 join ,并且如果从一个空的数据帧开始,您将永远无法使用.使用 pd.concat 代替:

You don't want to use join and you couldn't anyway if you start with an empty dataframe. Use pd.concat instead:

main_df = pd.concat([main_df, df], axis=1)

但是,我建议您用它来代替您的整个过程:

However, I would recommend this to replace your whole process:

def read_file(ticker):
    df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)).set_index('Date')
    return df.Close.rename(ticker)

with open ("sp500tickers.pickle", "rb") as f:
    tickers = pickle.load(f)

main_df = pd.concat([read_file(t) for t in tickers], axis=1)

这篇关于Pandas .join无法工作以合并S& P500股票数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-02 20:17