本文介绍了间隔中包含np.nan的组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个熊猫系列,其中包含零,一和np.nan:
I have a pandas series containing zeros, ones and np.nan:
import pandas as pd
import numpy as np
df1 = pd.Series([ 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, np.nan, np.nan, 1])
df1
Out[6]:
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
5 1.0
6 1.0
7 1.0
8 0.0
9 0.0
10 0.0
11 NaN
12 NaN
13 1.0
dtype: float64
我想创建一个数据帧df2,该数据帧包含间隔的开始和结束并具有相同的值,以及与之关联的值.在这种情况下,df2应该是...
I would like to create a dataframe df2 that contains the start and the end of intervals with the same value, together with the value associated... df2 in this case should be...
df2
Out[5]:
Start End Value
0 0 4 0
1 5 7 1
2 8 10 0
3 11 12 NaN
4 13 13 1
遵循解决方案此处:
s = df1.ne(df1.shift()).cumsum()
df2 = df1.groupby(s).apply(lambda x: pd.Series([x.index[0], x.index[-1], x.iat[0]],
index=['Start','End','Value']))
.unstack().reset_index(drop=True)
但不适用于这种情况
df2
Out[11]:
Start End Value
0 0.0 4.0 0.0
1 5.0 7.0 1.0
2 8.0 10.0 0.0
3 11.0 11.0 NaN
4 12.0 12.0 NaN
5 13.0 13.0 1.0
推荐答案
NaNs
对于相等性检查有问题.您可以解决这个问题,暂时用一个不带价值的值填充它.
NaNs
have issue with equality check. You could work around, with filling it temporarily with an unassuming value.
In [361]: s = df1.fillna('-dummy-').ne(df1.fillna('-dummy-').shift()).cumsum()
In [362]: df1.groupby(s).apply(lambda x: pd.Series([x.index[0], x.index[-1], x.iat[0]],
...: index=['Start','End','Value']))
...: .unstack().reset_index(drop=True)
Out[362]:
Start End Value
0 0.0 4.0 0.0
1 5.0 7.0 1.0
2 8.0 10.0 0.0
3 11.0 12.0 NaN
4 13.0 13.0 1.0
这篇关于间隔中包含np.nan的组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!