问题描述
我有一列中有时间戳的 pandas.DataFrame
.值以纪元为单位,相隔0.1秒.诸如 1488771900.100000、1488771900.200000
之类的值.但是,缺少值.所以我有 1488794389.500000
和 1488794389.900000
之间有3个缺失值.我想在数据框中插入在此列的最大值和最小值之间缺少值的行.因此,如果最小值为 1488771900.000000
,最大值为 1488794660.000000
,则我想插入所有值(以0.1秒为间隔)和所有其他列中的NA的行.
I have a pandas.DataFrame
with timestamps in a column. The values are in epoch and 0.1 seconds apart. Values like 1488771900.100000, 1488771900.200000
and so on. However, there are missing values. So I have 1488794389.500000
and then 1488794389.900000
with 3 missing values between. I want to insert rows in the dataframe with missing values between the max and min in this column. So if the min is 1488771900.000000
and max is 1488794660.000000
, I want to insert rows with all values separated by 0.1 seconds and NA in all other columns.
我在此链接中看到了一个答案,但不是不能复制相同的内容.
I saw an answer in this link, but wasn't able to replicate the same.
如何执行此操作?
推荐答案
您可以使用 pandas.DataFrame.resample
填写丢失的时间.需要注意的是,数据框需要具有 pandas.DateTimeIndex
.在您的情况下,时间很可能存储为从纪元以来以秒为单位的浮点数,因此需要在重新采样之前进行转换.这是执行该操作的功能.
You can fill in your missing times using pandas.DataFrame.resample
. The caveat is that the dataframe needs to have a pandas.DateTimeIndex
. In your case the time is likely stored as a float in seconds since epoch, and this needs to be converted prior to re-sampling. Here is a function which will perform that operation.
代码:
import datetime as dt
import pandas as pd
def resample(dataframe, time_column, sample_period):
# make a copy of the dataframe
dataframe = dataframe.copy()
# convert epoch times to datetime
dataframe.time = dataframe.time.apply(
lambda ts: dt.datetime.fromtimestamp(ts))
# make the datetimes into an index
dataframe.set_index(time_column, inplace=True)
# resample to desired period
dataframe = dataframe.resample(sample_period).asfreq().reset_index()
# convert datetimes back to epoch
epoch = dt.datetime.fromtimestamp(0)
dataframe.time = dataframe.time.apply(
lambda ts: (ts - epoch).total_seconds())
return dataframe
测试代码:
values = [
(1488771900.10, 'a'),
(1488771900.20, 'b'),
(1488771900.30, 'c'),
(1488771900.60, 'f'),
]
columns = ['time', 'value']
df = pd.DataFrame(values, columns=columns)
print(df)
new_df = resample(df, 'time', '100ms')
print(new_df)
结果:
time value
0 1.488772e+09 a
1 1.488772e+09 b
2 1.488772e+09 c
3 1.488772e+09 f
time value
0 1.488772e+09 a
1 1.488772e+09 b
2 1.488772e+09 c
3 1.488772e+09 NaN
4 1.488772e+09 NaN
5 1.488772e+09 f
这篇关于在 pandas 数据框中添加缺少的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!