我在2018年1月1日创建了一个只有datetime列且间隔为1秒的数据框,如下面的代码所示。

i = pd.date_range(start='2018-01-01 00:00:00', end='2018-01-01 23:59:00', freq="1S")
ts = pd.DataFrame(index=i)
ts = ts.reset_index()
ts = ts.rename(columns={'index': 'datetime'})`


df1:

    datetime
0   2018-01-01 00:00:00
1   2018-01-01 00:00:01
2   2018-01-01 00:00:02
3   2018-01-01 00:00:03
4   2018-01-01 00:00:04
5   2018-01-01 00:00:05
6   2018-01-01 00:00:06
7   2018-01-01 00:00:07
8   2018-01-01 00:00:08
9   2018-01-01 00:00:09
10  2018-01-01 00:00:10
11  2018-01-01 00:00:11
12  2018-01-01 00:00:12
13  2018-01-01 00:00:13
14  2018-01-01 00:00:14
15  2018-01-01 00:00:15
16  2018-01-01 00:00:16
17  2018-01-01 00:00:17
18  2018-01-01 00:00:18
19  2018-01-01 00:00:19
20  2018-01-01 00:00:20
21  2018-01-01 00:00:21
22  2018-01-01 00:00:22
23  2018-01-01 00:00:23
24  2018-01-01 00:00:24
25  2018-01-01 00:00:25
26  2018-01-01 00:00:26
27  2018-01-01 00:00:27
28  2018-01-01 00:00:28
29  2018-01-01 00:00:29`


我有另一个日期时间列和其他列的数据框

df2:

    datetime                a       b        c         d        e
0   2018-01-01 00:00:04     0.9
1   2018-01-01 00:00:06             0.6      0.7
2   2018-01-01 00:00:09     0.5              0.7       0.8
3   2018-01-01 00:00:16             2.3      3.6       4.9      5.0
4   2018-01-01 00:00:17     0.9     3.5      5.5
5   2018-01-01 00:00:23     0.1     0.6      0.0       1.7
6   2018-01-01 00:00:29     2.7     5.5      4.3                   `


现在,我尝试使用pandas外连接映射df1和df2的日期时间列,我希望我的预期结果看起来像

    datetime                a       b       c     d      e
0   2018-01-01 00:00:00
1   2018-01-01 00:00:01
2   2018-01-01 00:00:02
3   2018-01-01 00:00:03
4   2018-01-01 00:00:04     0.9
5   2018-01-01 00:00:05
6   2018-01-01 00:00:06             0.6      0.7
7   2018-01-01 00:00:07
8   2018-01-01 00:00:08
9   2018-01-01 00:00:09     0.5              0.7   0.8
10  2018-01-01 00:00:10
11  2018-01-01 00:00:11
12  2018-01-01 00:00:12
13  2018-01-01 00:00:13
14  2018-01-01 00:00:14
15  2018-01-01 00:00:15
16  2018-01-01 00:00:16             2.3      3.6   4.9   5.0
17  2018-01-01 00:00:17     0.9     3.5      5.5
18  2018-01-01 00:00:18
19  2018-01-01 00:00:19
20  2018-01-01 00:00:20
21  2018-01-01 00:00:21
22  2018-01-01 00:00:22
23  2018-01-01 00:00:23     0.1     0.6      0.0   1.7
24  2018-01-01 00:00:24
25  2018-01-01 00:00:25
26  2018-01-01 00:00:26
27  2018-01-01 00:00:27
28  2018-01-01 00:00:28
29  2018-01-01 00:00:29     2.7     5.5      4.3              `


但是我的输出看起来像这样

    datetime                a       b        c         d        e
0   2018-01-01 00:00:00
1   2018-01-01 00:00:01
2   2018-01-01 00:00:02
3   2018-01-01 00:00:03
4   2018-01-01 00:00:04
5   2018-01-01 00:00:05
6   2018-01-01 00:00:06
7   2018-01-01 00:00:07
8   2018-01-01 00:00:08
9   2018-01-01 00:00:09
10  2018-01-01 00:00:10
11  2018-01-01 00:00:11
12  2018-01-01 00:00:12
13  2018-01-01 00:00:13
14  2018-01-01 00:00:14
15  2018-01-01 00:00:15
16  2018-01-01 00:00:16
17  2018-01-01 00:00:17
18  2018-01-01 00:00:18
19  2018-01-01 00:00:19
20  2018-01-01 00:00:20
21  2018-01-01 00:00:21
22  2018-01-01 00:00:22
23  2018-01-01 00:00:23
24  2018-01-01 00:00:24
25  2018-01-01 00:00:25
26  2018-01-01 00:00:26
27  2018-01-01 00:00:27
28  2018-01-01 00:00:28
29  2018-01-01 00:00:29
30  2018-01-01 00:00:04     0.9
31  2018-01-01 00:00:06             0.6      0.7
32  2018-01-01 00:00:09     0.5              0.7       0.8
33  2018-01-01 00:00:16             2.3      3.6       4.9      5.0
34  2018-01-01 00:00:17     0.9     3.5      5.5
35  2018-01-01 00:00:23     0.1     0.6      0.0       1.7
36  2018-01-01 00:00:29     2.7     5.5      4.3                   `


我用于执行该操作的代码是:

test = pandas.merge(df1, df2, on = ['datetime'], how= 'outer')

我不太确定如何解决此问题,如果能得到一些帮助,我将不胜感激。

最佳答案

将ts与datetime索引保持一致,并按照评论中提到的@Scott Boston使用Reindex,

i = pd.date_range(start='2018-01-01 00:00:00', end='2018-01-01 23:59:00', freq="1S")
ts = pd.DataFrame(index=i)

df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime').reindex(ts.index)


                    a   b   c   d   e
2018-01-01 00:00:00 NaN NaN NaN NaN NaN
2018-01-01 00:00:01 NaN NaN NaN NaN NaN
2018-01-01 00:00:02 NaN NaN NaN NaN NaN
2018-01-01 00:00:03 NaN NaN NaN NaN NaN
2018-01-01 00:00:04 0.9
2018-01-01 00:00:05 NaN NaN NaN NaN NaN
2018-01-01 00:00:06     0.6 0.7
2018-01-01 00:00:07 NaN NaN NaN NaN NaN
2018-01-01 00:00:08 NaN NaN NaN NaN NaN
2018-01-01 00:00:09 0.5     0.7 0.8
2018-01-01 00:00:10 NaN NaN NaN NaN NaN
2018-01-01 00:00:11 NaN NaN NaN NaN NaN
2018-01-01 00:00:12 NaN NaN NaN NaN NaN
2018-01-01 00:00:13 NaN NaN NaN NaN NaN
2018-01-01 00:00:14 NaN NaN NaN NaN NaN
2018-01-01 00:00:15 NaN NaN NaN NaN NaN
2018-01-01 00:00:16     2.3 3.6 4.9 5.0
2018-01-01 00:00:17 0.9 3.5 5.5


选项2:连拍

pd.concat([ts, df.set_index('datetime')], axis = 1)

关于python - 映射两个表的日期时间列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54009571/

10-11 07:42