我有一个看起来像这样的数据框。
Name 2012 2013 2014
7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
A a b c d e f g a b c d e f g h i j k l m a b c d e f g h i j k l m
B a b c d e f g a b c d e f g h i j k l m a b c d e f g h i j k l m
等等。 2012、2013、2014代表年份,下面分别是月份,而a,b,c,d,e ...代表相应月份中NAME的值,即A,B..。每个名称的a,b,c,d,e ...是不同的,此处仅出于表示目的而显示。
目前,我已执行以下操作:
workbook = pd.ExcelFile('XYZ.xlsx')
df = workbook.parse(sheetname='Page1-2')
df2 = pd.melt(df, id_vars=["Name"],
var_name="Date", value_name="Value")
即我在df中导入了XYZ.xlsx文件。使用pd.melt将df排序为df2。 df2的输出如下所示:
Name Date Value
A 2012 a
A Unnamed b
A Unnamed c
A Unnamed d
A Unnamed e
A Unnamed f
A Unnamed g
A 2013 a
A Unnamed b
A Unnamed c
A Unnamed d
A Unnamed e
以及其他年份和名称。我希望我的日期列显示如下内容:
Date
7/2012
8/2012
9/2012
10/2012
11/2012
12/2012
1/2013
2/2013
3/2013
4/2013
5/2013
6/2013
7/2013
8/2013
根据初始数据框中提到的月份和年份。我不确定该怎么做。任何帮助深表感谢!
打印我的示例数据库(df.to_dict())
{'Name': {0: nan, 1: 'A', 2: 'B'}, 2012: {0: '07', 1: 'a', 2: 'a'},'Unnamed: 2': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 3': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 4': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 5': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 6': {0: '12', 1: 'f', 2: 'f'}, '2013': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 8': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 9': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 10': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 11': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 12': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 13': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 14': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 15': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 16': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 17': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 18': {0: '12', 1: 'f', 2: 'f'}, '2014': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 20': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 21': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 22': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 23': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 24': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 25': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 26': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 27': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 28': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 29': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 30': {0: '12', 1: 'f', 2: 'f'}}
最佳答案
采用:
#create index with column Name
df = df.set_index('Name')
#create Multiindex with columns (add instead Unammed categories) and first row
idx = pd.Series(df.columns)
df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(),
df.iloc[0]], names=('Date','Month'))
#remove first row
df = df.iloc[1:]
print (df)
Date 2012 2013 ... 2014
Month 07 08 09 10 11 12 01 02 03 04 ... 03 04 05 06 07 08 09 10 11 12
Name ...
A a b c d e f a b c d ... c d e f g h i j k l
B a b c d e f a b c d ... c d e f g h i j k l
print (df.columns)
MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
names=['Date', 'Month'])
#reshape
df2 = df.unstack().reset_index(name='Value')
df2['Date'] = df2['Month'] + '/' + df2['Date']
df2 = df2.drop('Month', axis=1)
print (df2)
Date Name Value
0 07/2012 A a
1 07/2012 B a
2 08/2012 A b
3 08/2012 B b
4 09/2012 A c
5 09/2012 B c
6 10/2012 A d
7 10/2012 B d
8 11/2012 A e
9 11/2012 B e
10 12/2012 A f
11 12/2012 B f
如果可以从文件读取
df
,则将用于读取第一行和第二行的参数header=[0,1]
添加到MultiIndex
,并将第一列Name
读取到index
。然后解决方案有所改变:df = pd.read_csv('filename', header=[0,1], index_col=[0])
idx = pd.Series(df.columns.get_level_values(0))
df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(),
df.columns.get_level_values(1)],
names=('Date','Month'))
print (df)
Date 2012 2013 ... 2014
Month 07 08 09 10 11 12 01 02 03 04 ... 03 04 05 06 07 08 09 10 11 12
Name ...
A a b c d e f a b c d ... c d e f g h i j k l
B a b c d e f a b c d ... c d e f g h i j k l
print (df.columns)
MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
names=['Date', 'Month'])
#reshape
df2 = df.unstack().reset_index(name='Value').rename(columns={'level_2':'Name'})
df2['Date'] = df2['Month'].astype(str) + '/' + df2['Date'].astype(str)
#df2['Date'] = pd.to_datetime(df2['Date'].radd('1/'), format='%d/%m/%y')
df2 = df2.drop('Month', axis=1)
print (df2)
Date Name Value
0 07/2012 A a
1 07/2012 B a
2 08/2012 A b
3 08/2012 B b
4 09/2012 A c
5 09/2012 B c
6 10/2012 A d
7 10/2012 B d
8 11/2012 A e