问题描述
我有带有日期时间和一列的数据框.我必须在``特定日期''中找到最大的空值拉伸并将其替换为零.在下面的示例中,1月1日最大的空拉伸值是3倍,因此我必须替换为零.同样,我必须复制1月2日的流程.
I have dataframe with datetime and a column. I have to find maximum stretch of null values in a 'particular date' and replace it with zero.In example below, January 1st the maximum stretch null value is 3 times, so I have to replace this zero. Similarly, I have to replicate the process for 2nd January.
注意:只需将最大的空值范围替换为零,而不必替换为其他值.
Note : Only the maximum stretch of null values has to be replaced with zero not the others.
以下是我的示例数据:
Datetime X
01-01-2018 00:00 1
01-01-2018 00:05 Nan
01-01-2018 00:10 2
01-01-2018 00:15 3
01-01-2018 00:20 2
01-01-2018 00:25 Nan
01-01-2018 00:30 Nan
01-01-2018 00:35 Nan
01-01-2018 00:40 4
02-01-2018 00:00 Nan
02-01-2018 00:05 2
02-01-2018 00:10 2
02-01-2018 00:15 2
02-01-2018 00:20 2
02-01-2018 00:25 Nan
02-01-2018 00:30 Nan
02-01-2018 00:35 3
02-01-2018 00:40 Nan
推荐答案
使用:
#convert columns to floats and datetimes
df['X'] = df['X'].astype(float)
df['Datetime'] = pd.to_datetime(df['Datetime'], dayfirst=True)
#check missing values
s = df['X'].isna()
#create consecutive groups
g = s.ne(s.shift()).cumsum()
#get dates from datetimes
dates = df['Datetime'].dt.date
#get counts of consecutive NaNs
sizes = s.groupby([g[s], dates[s]]).transform('count')
#compare max count per dates to mask
mask = sizes.groupby(dates).transform('max').eq(sizes)
#set 0 by mask
df.loc[mask, 'X'] = 0
print (df)
Datetime X
0 2018-01-01 00:00:00 1.0
1 2018-01-01 00:05:00 NaN
2 2018-01-01 00:10:00 2.0
3 2018-01-01 00:15:00 3.0
4 2018-01-01 00:20:00 2.0
5 2018-01-01 00:25:00 0.0
6 2018-01-01 00:30:00 0.0
7 2018-01-01 00:35:00 0.0
8 2018-01-01 00:40:00 4.0
9 2018-01-02 00:00:00 NaN
10 2018-01-02 00:05:00 2.0
11 2018-01-02 00:10:00 2.0
12 2018-01-02 00:15:00 2.0
13 2018-01-02 00:20:00 2.0
14 2018-01-02 00:25:00 0.0
15 2018-01-02 00:30:00 0.0
16 2018-01-02 00:35:00 3.0
17 2018-01-02 00:40:00 NaN
您可以创建所有日期时间的filtered
列表以进行替换,并与掩码一起使用掩码通过&
进行按位AND测试缺失值:
You can create filtered
list of all datetimes for replace and chain together with mask for testing missing values by &
for bitwise AND:
sizes = s.groupby([g[s & m], dates[s & m]]).transform('count')
一起:
df['X'] = df['X'].astype(float)
df['Datetime'] = pd.to_datetime(df['Datetime'], dayfirst=True)
#check missing values
s = df['X'].isna()
#create consecutive groups
g = s.ne(s.shift()).cumsum()
#get dates from datetimes
dates = df['Datetime'].dt.floor('d')
filtered = ['2018-01-01','2019-01-01']
m = dates.isin(filtered)
#get counts of consecutive NaNs
sizes = s.groupby([g[s & m], dates[s & m]]).transform('count')
#compare max count per dates to mask
mask = sizes.groupby(dates).transform('max').eq(sizes)
#set 0 by mask
df.loc[mask, 'X'] = 0
print (df)
Datetime X
0 2018-01-01 00:00:00 1.0
1 2018-01-01 00:05:00 NaN
2 2018-01-01 00:10:00 2.0
3 2018-01-01 00:15:00 3.0
4 2018-01-01 00:20:00 2.0
5 2018-01-01 00:25:00 0.0
6 2018-01-01 00:30:00 0.0
7 2018-01-01 00:35:00 0.0
8 2018-01-01 00:40:00 4.0
9 2018-01-02 00:00:00 NaN
10 2018-01-02 00:05:00 2.0
11 2018-01-02 00:10:00 2.0
12 2018-01-02 00:15:00 2.0
13 2018-01-02 00:20:00 2.0
14 2018-01-02 00:25:00 NaN
15 2018-01-02 00:30:00 NaN
16 2018-01-02 00:35:00 3.0
17 2018-01-02 00:40:00 NaN
这篇关于Python-拉伸中查找最大空值并替换为0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!