我有一个大数据集(>200k),我正在尝试用一个值替换零序列。大于2个零的零序是伪影,应通过将其设置为np.nan将其移除。
我读过Searching a sequence in a NumPy array但它不完全符合我的要求,因为我没有静态模式。
np.array([0, 1.0, 0, 0, -6.0, 13.0, 0, 0, 0, 1.0, 16.0, 0, 0, 0, 0, 1.0, 1.0, 1.0, 1.0])
# should be converted to this
np.array([0, 1.0, 0, 0, -6.0, 13.0, NaN, NaN, NaN, 1.0, 16.0, NaN, NaN, NaN, NaN, 1.0, 1.0, 1.0, 1.0])
如果你需要更多的信息,请告诉我。
提前谢谢!
结果:
谢谢你的回答,以下是我的(非专业)考试成绩,288240分
divakar took 0.016000ms to replace 87912 points
desiato took 0.076000ms to replace 87912 points
polarise took 0.102000ms to replace 87912 points
因为@divakar的解是最短最快的,所以我接受他的解。
最佳答案
基本上这是一个binary closing operation
的值,对闭合间隙有一个阈值要求。这是一个基于它的实现-
# Pad with ones so as to make binary closing work around the boundaries too
a_extm = np.hstack((True,a!=0,True))
# Perform binary closing and look for the ones that have not changed indiicating
# the gaps in those cases were above the threshold requirement for closing
mask = a_extm == binary_closing(a_extm,structure=np.ones(3))
# Out of those avoid the 1s from the original array and set rest as NaNs
out = np.where(~a_extm[1:-1] & mask[1:-1],np.nan,a)
在处理大型数据集时,如果需要在前面的方法中附加边界元素(这可能会使处理大型数据集的成本增加),可以这样做-
# Create binary closed mask
mask = ~binary_closing(a!=0,structure=np.ones(3))
idx = np.where(a)[0]
mask[:idx[0]] = idx[0]>=3
mask[idx[-1]+1:] = a.size - idx[-1] -1 >=3
# Use the mask to set NaNs in a
out = np.where(mask,np.nan,a)