考虑通常的trades数据帧:

trades = pd.DataFrame({
         'time': pd.to_datetime(['20160525 13:30:00.023',
                      '20160525 13:30:00.038',
                      '20160525 13:30:00.048',
                      '20160525 13:30:00.048',
                      '20160525 13:30:00.048']),
         'ticker': ['MSFT', 'MSFT','GOOG', 'BOOB', 'AAPL'],
         'price': [51.95, 51.95, 720.77, 720.92, 98.00],
         'quantity': [75, 155, 100, 100, 100]},
        columns=['time', 'ticker', 'price', 'quantity'])

trades
Out[42]:
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   GOOG  720.77       100
3 2016-05-25 13:30:00.048   BOOB  720.92       100
4 2016-05-25 13:30:00.048   AAPL   98.00       100


在这里,我想从长到宽重塑,做一些事情,然后从宽到长重塑。

从长到宽很容易

trades.set_index(['time','ticker'], inplace = True)
trades = trades.unstack()

trades
Out[44]:
                        price                        quantity                \
ticker                   AAPL    BOOB    GOOG   MSFT     AAPL   BOOB   GOOG
time
2016-05-25 13:30:00.023   NaN     NaN     NaN  51.95      NaN    NaN    NaN
2016-05-25 13:30:00.038   NaN     NaN     NaN  51.95      NaN    NaN    NaN
2016-05-25 13:30:00.048  98.0  720.92  720.77    NaN    100.0  100.0  100.0


ticker                    MSFT
time
2016-05-25 13:30:00.023   75.0
2016-05-25 13:30:00.038  155.0
2016-05-25 13:30:00.048    NaN


但是由于多种原因,我现在确实想在列上使用此多索引,因此我有以下代码:

trades.columns=['_'.join(t) for t in trades.columns]


这基本上摆脱了多索引,并允许我使用普通列。数据现在看起来像:

trades
Out[47]:
                         price_AAPL  price_BOOB  price_GOOG  price_MSFT  \
time
2016-05-25 13:30:00.023         NaN         NaN         NaN       51.95
2016-05-25 13:30:00.038         NaN         NaN         NaN       51.95
2016-05-25 13:30:00.048        98.0      720.92      720.77         NaN

                         quantity_AAPL  quantity_BOOB  quantity_GOOG  \
time
2016-05-25 13:30:00.023            NaN            NaN            NaN
2016-05-25 13:30:00.038            NaN            NaN            NaN
2016-05-25 13:30:00.048          100.0          100.0          100.0

                         quantity_MSFT
time
2016-05-25 13:30:00.023           75.0
2016-05-25 13:30:00.038          155.0
2016-05-25 13:30:00.048            NaN


问题是:现在如何恢复长格式?

最佳答案

您可以使用:

#create MultiIndex from columns
trades.columns = trades.columns.str.split('_', expand=True)
#stack and set index names for new column names
trades = trades.stack().rename_axis(['time','ticker']).reset_index()
#convert to int
trades.quantity = trades.quantity.astype(int)
print (trades)
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   AAPL   98.00       100
3 2016-05-25 13:30:00.048   BOOB  720.92       100
4 2016-05-25 13:30:00.048   GOOG  720.77       100

关于python - Pandas :列的多索引丢失后,从宽到长,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42114760/

10-12 16:41