给定以下数据帧:
import pandas as pd
pd.DataFrame({"start": ["2017-01-01 13:09:01", "2017-01-01 13:09:07", "2017-01-01 13:09:12"],
"end": ["2017-01-01 13:09:05", "2017-01-01 13:09:09", "2017-01-01 13:09:14"],
"status": ["OK", "ERROR", "OK"]})
拥有:
| start | end | status |
|---------------------|---------------------|--------|
| 2017-01-01 13:09:01 | 2017-01-01 13:09:05 | OK |
| 2017-01-01 13:09:07 | 2017-01-01 13:09:09 | ERROR |
| 2017-01-01 13:09:12 | 2017-01-01 13:09:14 | OK |
我想把它转换成另一种格式,也就是说,“展开”间隔,并将它们变成DATETMETHEMENT,并重采样数据。结果应该是这样的:
想要:
| | status |
|---------------------|-----------|
| 2017-01-01 13:09:01 | OK |
| 2017-01-01 13:09:02 | OK |
| 2017-01-01 13:09:03 | OK |
| 2017-01-01 13:09:04 | OK |
| 2017-01-01 13:09:05 | OK |
| 2017-01-01 13:09:06 | NAN |
| 2017-01-01 13:09:07 | ERROR |
| 2017-01-01 13:09:08 | ERROR |
| 2017-01-01 13:09:09 | ERROR |
| 2017-01-01 13:09:10 | NAN |
| 2017-01-01 13:09:11 | NAN |
| 2017-01-01 13:09:12 | OK |
| 2017-01-01 13:09:13 | OK |
| 2017-01-01 13:09:14 | OK |
非常感谢您的帮助!
最佳答案
使用IntervalIndex
:
# create an IntervalIndex from start/end
iv_idx = pd.IntervalIndex.from_arrays(df['start'], df['end'], closed='both')
# generate the desired index of individual times
new_idx = pd.date_range(df['start'].min(), df['end'].max(), freq='s')
# set the index of 'status' as the IntervalIndex, then reindex to the new index
result = df['status'].set_axis(iv_idx, inplace=False).reindex(new_idx)
result
的结果输出:2017-01-01 13:09:01 OK
2017-01-01 13:09:02 OK
2017-01-01 13:09:03 OK
2017-01-01 13:09:04 OK
2017-01-01 13:09:05 OK
2017-01-01 13:09:06 NaN
2017-01-01 13:09:07 ERROR
2017-01-01 13:09:08 ERROR
2017-01-01 13:09:09 ERROR
2017-01-01 13:09:10 NaN
2017-01-01 13:09:11 NaN
2017-01-01 13:09:12 OK
2017-01-01 13:09:13 OK
2017-01-01 13:09:14 OK
Freq: S, Name: status, dtype: object