考虑通常的trades
数据帧:
trades = pd.DataFrame({
'time': pd.to_datetime(['20160525 13:30:00.023',
'20160525 13:30:00.038',
'20160525 13:30:00.048',
'20160525 13:30:00.048',
'20160525 13:30:00.048']),
'ticker': ['MSFT', 'MSFT','GOOG', 'BOOB', 'AAPL'],
'price': [51.95, 51.95, 720.77, 720.92, 98.00],
'quantity': [75, 155, 100, 100, 100]},
columns=['time', 'ticker', 'price', 'quantity'])
trades
Out[42]:
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 BOOB 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
在这里,我想从长到宽重塑,做一些事情,然后从宽到长重塑。
从长到宽很容易
trades.set_index(['time','ticker'], inplace = True)
trades = trades.unstack()
trades
Out[44]:
price quantity \
ticker AAPL BOOB GOOG MSFT AAPL BOOB GOOG
time
2016-05-25 13:30:00.023 NaN NaN NaN 51.95 NaN NaN NaN
2016-05-25 13:30:00.038 NaN NaN NaN 51.95 NaN NaN NaN
2016-05-25 13:30:00.048 98.0 720.92 720.77 NaN 100.0 100.0 100.0
ticker MSFT
time
2016-05-25 13:30:00.023 75.0
2016-05-25 13:30:00.038 155.0
2016-05-25 13:30:00.048 NaN
但是由于多种原因,我现在确实想在列上使用此多索引,因此我有以下代码:
trades.columns=['_'.join(t) for t in trades.columns]
这基本上摆脱了多索引,并允许我使用普通列。数据现在看起来像:
trades
Out[47]:
price_AAPL price_BOOB price_GOOG price_MSFT \
time
2016-05-25 13:30:00.023 NaN NaN NaN 51.95
2016-05-25 13:30:00.038 NaN NaN NaN 51.95
2016-05-25 13:30:00.048 98.0 720.92 720.77 NaN
quantity_AAPL quantity_BOOB quantity_GOOG \
time
2016-05-25 13:30:00.023 NaN NaN NaN
2016-05-25 13:30:00.038 NaN NaN NaN
2016-05-25 13:30:00.048 100.0 100.0 100.0
quantity_MSFT
time
2016-05-25 13:30:00.023 75.0
2016-05-25 13:30:00.038 155.0
2016-05-25 13:30:00.048 NaN
问题是:现在如何恢复长格式?
最佳答案
您可以使用:
#create MultiIndex from columns
trades.columns = trades.columns.str.split('_', expand=True)
#stack and set index names for new column names
trades = trades.stack().rename_axis(['time','ticker']).reset_index()
#convert to int
trades.quantity = trades.quantity.astype(int)
print (trades)
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 AAPL 98.00 100
3 2016-05-25 13:30:00.048 BOOB 720.92 100
4 2016-05-25 13:30:00.048 GOOG 720.77 100
关于python - Pandas :列的多索引丢失后,从宽到长,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42114760/