问题描述
我想实现异常值检测,它将使用一个窗口来检查下一个元素是否为异常值.假设我们在 pd.Series 上使用长度为 3 的窗口,如下所示:[0,1,2,3,4].我会在 [0,1,2] 上计算中位数和疯狂(或平均值和标准差)并检查 3 是否是异常值.
我实现了一个 for 循环解决方案,但它真的很慢.
I want to implement outlier detection which will use a window to check whether the next element is an outlier or not. Let's say we use a window of length 3 on pd.Series like this: [0,1,2,3,4]. I would calculate median and mad (or mean and std) on [0,1,2] and check whether 3 is an outlier.
I implemented a for-loop solution but it's really slow.
推荐答案
说你开始
s = pd.Series([1, 2, 1, 4, 2000, 2])
然后使用 rolling
,下面将显示第 5 个元素与长度为 3 的窗口中位数相距 200:
Then using rolling
, the following will show you that the 5th element is 200 away from a length-3 window median:
(s - s.rolling(3).median()).abs() > 200
0 False
1 False
2 False
3 False
4 True
5 False
dtype: bool
它是矢量化的,因此应该比 for
循环快得多.
It is vectorized, and therefore should be much faster than a for
loop.
这篇关于在 Pandas 中使用窗口进行动态异常值检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!