我有一个熊猫数据框,如下所示。索引是日期时间对象,按天排序,分为5分钟的容器。我有一个专栏叫“col1”。如果我这么做了

df['col1']

我得到:
DateTime
2008-04-28 09:40:00     300.0
2008-04-28 09:45:00    -800.0
2008-04-28 09:50:00       0.0
2008-04-28 09:55:00    -100.0
2008-04-28 10:00:00       0.0
2008-04-29 09:40:00     500.0
2008-04-29 09:45:00     800.0
2008-04-29 09:50:00     100.0
2008-04-29 09:55:00    -100.0
2008-04-29 10:00:00       0.0

我在pandas中有另一个数据帧是使用原始数据帧中的groupby获得的,使用
df2 = df([df.index.time])[['col2']].mean()

结果是:
           col2
09:40:00   4603.585657
09:45:00   5547.011952
09:50:00   8532.007952
09:55:00   6175.298805
10:00:00   4236.055777

我想做的是在不使用for循环的情况下,将col1除以col2。为了更好的解释,在所有的日子里,每一个箱子用col1除以col2例如,将col1中的所有9:40:00值除以col2中的9:40:00值。
如果没有for循环,我不知道如何开始这样做,但我有一个印象,熊猫应该可以做到这一点。
预期产出为:
DateTime
2008-04-28 09:40:00     300.0/4603.585657
2008-04-28 09:45:00    -800.0/5547.011952
2008-04-28 09:50:00       0.0/8532.007952
2008-04-28 09:55:00    -100.0/6175.298805
2008-04-28 10:00:00       0.0/4236.055777
2008-04-29 09:40:00     500.0/4603.585657
2008-04-29 09:45:00     800.0/5547.011952
2008-04-29 09:50:00     100.0/8532.007952
2008-04-29 09:55:00    -100.0/6175.298805
2008-04-29 10:00:00       0.0/4236.055777

最佳答案

如果需要除以时间:

df['new'] = df['col1'].div(df.groupby(df.index.time)['col1'].transform('mean'))
print (df)
                      col1   new
DateTime
2008-04-28 09:40:00  300.0  0.75
2008-04-28 09:45:00 -800.0  -inf
2008-04-28 09:50:00    0.0  0.00
2008-04-28 09:55:00 -100.0  1.00
2008-04-28 10:00:00    0.0   NaN
2008-04-29 09:40:00  500.0  1.25
2008-04-29 09:45:00  800.0   inf
2008-04-29 09:50:00  100.0  2.00
2008-04-29 09:55:00 -100.0  1.00
2008-04-29 10:00:00    0.0   NaN

或者如果需要除以天:
df['new'] = df['col1'].div(df.groupby(df.index.date)['col1'].transform('mean'))
print (df)
                      col1       new
DateTime
2008-04-28 09:40:00  300.0 -2.500000
2008-04-28 09:45:00 -800.0  6.666667
2008-04-28 09:50:00    0.0 -0.000000
2008-04-28 09:55:00 -100.0  0.833333
2008-04-28 10:00:00    0.0 -0.000000
2008-04-29 09:40:00  500.0  1.923077
2008-04-29 09:45:00  800.0  3.076923
2008-04-29 09:50:00  100.0  0.384615
2008-04-29 09:55:00 -100.0 -0.384615
2008-04-29 10:00:00    0.0  0.000000

关于python - 用 Pandas 除以垃圾箱,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55239779/

10-12 13:29