问题描述
情况是我有几个文件包含time_series数据,用于包含多个字段的各种股票。每个文件包含
The situation is that I have a few files with time_series data for various stocks with several fields. each file contains
time, open, high, low, close, volume
目标是将所有内容整合到表格的一个数据框中
the goal is to get that all into one dataframe of the form
field open high ...
security hk_1 hk_2 hk_3 ... hk_1 hk_2 hk_3 ... ...
time
t_1 open_1_1 open_2_1 open_3_1 ... high_1_1 high_2_1 high_3_1 ... ...
t_2 open_1_2 open_2_2 open_3_2 ... high_1_2 high_2_2 high_3_2 ... ...
... ... ... ... ... ... ... ... ... ...
我创建了一个多索引
fields = ['time','open','high','low','close','volume','numEvents','value']
midx = pd.MultiIndex.from_product([security_name'], fields], names=['security', 'field'])
并且开始尝试将多索引应用于从csv读取数据得到的数据帧(通过创建新数据帧并添加索引)
and for a start, tried to apply that multiindex to the dataframe I get from reading the data from csv (by creating a new dataframe and adding the index)
for c in eqty_names_list:
midx = pd.MultiIndex.from_product([[c], fields], names=['security', 'field'])
df_temp = pd.read_csv('{}{}.csv'.format(path, c))
df_temp = pd.DataFrame(df_temp, columns=midx, index=df_temp['time'])
df_temp.df_name = c
all_dfs.append(df_temp)
但是,仅限新数据帧包含nan
However, the new dataframe only contains nan
security 1_HK
field time open high low close volume
time
NaN NaN NaN NaN NaN NaN NaN
此外,它仍包含一段时间列,尽管我试图制作索引(以便我以后可以通过索引加入其他股票的所有其他数据框以获得聚合gated dataframe)。
Also, it still contains a column for time, although I tried to make that the index (so that I can later join all the other dataframes for other stocks by index to get the aggregated dataframe).
如何在不丢失数据的情况下将多索引应用于数据帧,然后加入看起来像这样的数据帧
How can I apply the multiindex to the dataframe without losing my data and then later join the dataframes looking like this
security 1_HK
field time open high low close volume
time
创建类似的东西(注释层次结构字段和安全性已切换)
to create something like (note hierarchy field and security are switched)
field time open high ...
security 1_HK 2_HK ... 1_HK 2_HK ... ...
time
推荐答案
我认为你可以先将所有文件列入文件
,然后使用list comprehension获取所有DataFrames和列(轴= 1)
。如果添加参数 keys
,则在列中获得 Multiindex
:
I think you can first get all files to list files
, then with list comprehension get all DataFrames and concat
them by columns (axis=1)
. If add parameter keys
, you get Multiindex
in columns:
文件:
,
,
import pandas as pd
import glob
files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp) for fp in files]
eqty_names_list = ['hk1','hk2','hk3']
df = pd.concat(dfs, keys=eqty_names_list, axis=1)
print (df)
hk1 hk2 hk3
a b c a b c a b c
0 0 1 2 0 9 6 0 7 1
1 1 5 8 1 6 4 1 3 2
最后需要和:
Last need swaplevel
and sort_index
:
df.columns = df.columns.swaplevel(0,1)
df = df.sort_index(axis=1)
print (df)
a b c
hk1 hk2 hk3 hk1 hk2 hk3 hk1 hk2 hk3
0 0 0 0 1 9 7 2 6 1
1 1 1 1 5 6 3 8 4 2
这篇关于pandas将multicolumnindex应用于数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!