问题描述
我想使用python从此数据中检测出峰:
I would like to detect the peaks from this data using python:
data = [1.0, 0.35671858559485703, 0.44709399319470694, 0.29438948200831194, 0.5163825635166547, 0.3036363865322419, 0.34031782308777747, 0.2869558046065574, 0.28190537831716, 0.2807516154537239, 0.34320479518313507, 0.21117275536958913, 0.30304626765388043, 0.4972542099530442, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18200891715227194, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.28830608331168983, 0.057156776746163526, 0.043418555819326035, 0.022527521866967784, 0.035414574439784685, 0.062273775107322626, 0.04569227783752021, 0.04978915781132807, 0.0599089458581528, 0.05692515997545401, 0.05884619933405206, 0.0809943356922021, 0.07466587894671428, 0.08548458657792352, 0.049216679971411645, 0.04742180324984401, 0.05822208549398862, 0.03465282733964001, 0.014005094192867372, 0.052004161876744344, 0.061297263734617496, 0.01867087951563289, 0.01390993522118277, 0.021515814095838564, 0.025260618727204275, 0.0157022555745128, 0.041999490119172936, 0.0441231248537558, 0.03079711140612242, 0.04177946154195037, 0.047476050325192885, 0.05087930020034335, 0.03889899267688956, 0.02114033158686702, 0.026726959895528927, 0.04623461918879543, 0.05426474524591766, 0.04421866212189775, 0.041911901968304605, 0.019982199103543322, 0.026520396430805435, 0.03952286472888431, 0.03842652984978244, 0.02779682035551695, 0.02043518392128019, 0.07706934170969436]
您可以绘制它:
import matplotlib.pyplot as plt
plt.plot(data)
我用红色圈出了要自动检测的峰.
I encircled the peaks that I would like to automatically detect in red.
峰值特征:
我感兴趣的是找到峰值,此后对于某些数据点(即3-4个),信号相对平稳.平滑是指振幅变化在峰值之后的数据点之间是可比较的.我猜想,这意味着更多的数学术语:峰,在某些数据点之后,如果要拟合直线,则斜率将接近0.
I am interested in finding peaks after which, for some data points (i.e. 3-4), the signal is relatively smooth. By smooth I mean that the changes in amplitudes are comparable between the data-points after the peak. I guess, that this means in more mathematical terms: Peaks, after which for some datapoints, if you were to a fit a linear line, then the slope would be close to 0.
到目前为止我已经尝试过:
我认为元素之间的差异(将0表示相同的长度)会更好地揭示峰:
I thought that the difference between the elements (appending 0 to have the same length) would reveal the peaks much better:
diff_list = []
# Append 0 to have the same length as data
data_d = np.append(data,0)
for i in range(len(data)):
diff = data_d[i]-data_d[i+1]
# If difference is samller than 0, I set it to 0 -> Just interested in "falling" peaks
if diff < 0:
diff = 0
diff_list= np.append(diff_list,diff)
当我绘制diff_list
时,它看起来已经好多了:
When I plot diff_list
it looks already much better:
但是,简单的阈值峰值检测算法不起作用,因为第一部分中的噪声与后来的峰值具有相同的幅度.
However, a simple threshold value peak-detection algorithm does not work, because the noise in the first section has the same amplitude as the peak later on.
因此,我需要一种能够稳健地找到峰值的算法,或者一种能够大幅降低噪声而又不至于对峰值造成很大衰减的方法,而最重要的是无需移动它们.有人有主意吗?
I came across this blog and tried this method:
peaks_d = detect_peaks(diff_list, mph=None, mpd=4, threshold=0.1, edge='falling', kpsh=False, valley=False, show=False, ax=None)
plt.plot(diff_list)
plt.plot(peaks_d[:-1], diff_list[peaks_d[:-1]], "x")
plt.show()
...但是我得到了:
...but I got:
...真的,我相信我需要更多的预处理.
...so really, I believe that I need some more pre-processing.
所以我尝试计算梯度:
plt.plot(np.gradient(data))
但是,噪声中的梯度可与其中一个峰相媲美:
However, the gradient within the noise is comparable to one of the peaks:
可以使用什么:
->噪声:在彼此附近的位置上有许多相似的振幅点.也许有人可以检测到这些区域并过滤掉它们(即将它们设置为0)
-> Noise: There are a multitude of similar amplitude points in a near location to each other. Maybe one could detect those areas and filter them out (i.e. set them to 0)
我尝试遵循此方法 :
# Data
y = diff_list.tolist()
# Settings: lag = 30, threshold = 5, influence = 0
lag = 10
threshold = 0.1
influence = 1
# Run algo with settings from above
result = thresholding_algo(y, lag=lag, threshold=threshold, influence=influence)
# Plot result
plt.plot(result["signals"])
但是,我得到了:
基于@Jussi Nurminen的评论:
Based on a comment from @Jussi Nurminen:
import scipy.signal as sg
max_places = (np.array(sg.argrelmax(diff_list))[0]).tolist()
plt.plot(diff_list)
plt.plot(max_places, diff_list[max_places], "x")
plt.show()
peaks = []
for check in max_places:
if check+5 < len(diff_list):
gr = abs(np.average(np.gradient(diff_list[check+1: check+5])))
if gr < 0.01:
peaks.append(check)
plt.plot(diff_list)
plt.plot(peaks[:-1], diff_list[peaks[:-1]], "x")
plt.show()
修改5:
以下是用于测试任何算法的相似数据:
Here is similar data to test any algorithm:
data2 = [1.0, 0.4996410902399043, 0.3845950995707942, 0.38333441505960125, 0.3746384799687852, 0.28956967636700215, 0.31468441185494306, 0.5109048238958792, 0.5041481423190644, 0.41629226772762024, 0.5817609846838199, 0.3072152962171569, 0.5870564826981163, 0.4233247394608264, 0.5943712016644392, 0.4946091070102793, 0.36316740988182716, 0.4387555870158762, 0.45290920032442744, 0.48445358617984213, 0.8303387875295111, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.29678306715530073, 0.10146278147135124, 0.10120143287506084, 0.10330143251114839, 0.0802259786323741, 0.06858944745608002, 0.04600545347437729, 0.014440053029463367, 0.019023393725625705, 0.045201054387436344, 0.058496635702267374, 0.05656947149500993, 0.0463696266116956, 0.04903205756575247, 0.02781307505224703, 0.044280150764466876, 0.03746976646628557, 0.021526918040025544, 0.0038244080425488013, 0.008617907527160991, 0.0112760689575489, 0.009157686770957874, 0.013043259260489413, 0.01621417695776057, 0.016502269315028423, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3210019708643843, 0.11441868790191953, 0.12862935834434436, 0.08790971283197381, 0.09127615787146504, 0.06360039847679771, 0.032247149009635476, 0.07225952295002563, 0.095632185243862, 0.09171396569135751, 0.07935726217072689, 0.08690487354356599, 0.08787369092132288, 0.04980466729311508, 0.05675819557118429, 0.06826614158574265, 0.08491084598657253, 0.07037944101030547, 0.06549710463329293, 0.06429902857281444, 0.07282805735716101, 0.0667027178198566, 0.05590329380937183, 0.05189048980041104, 0.04609913889901785, 0.01884014489167378, 0.02782496113905073, 0.03343588833365329, 0.028423168106849694, 0.028895130687196867, 0.03146961123393891, 0.02287127937400026, 0.012173655214339595, 0.013332601407407033, 0.014040309216796854, 0.003450677642354792, 0.010854992025496528, 0.011804042414950701, 0.008100266690771957, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18547803170164875, 0.008457776819382444, 0.006607607749756658, 0.008566964920042127, 0.024793283595437438, 0.04334031667011553, 0.012330921737457376, 0.00994343436054472, 0.008003962298473758, 0.0025523166577987263, 0.0009309499302016907, 0.0027602202618852126, 0.0034442123857338675, 0.0006448449815386562, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
使用@jojo的答案,并选择适当的参数(dy_lim = 0.1
和di_lim = 10
,结果很接近,但是添加了一些不应该是峰值的点.
Using @jojo's answer, and choosing appropriate parameters (dy_lim = 0.1
and di_lim = 10
, the result is close, but there were some points added which should not be peaks.
修改5:
但是,另一种情况.
data = [1.0, 0.0, -0.0, 0.014084507042253521, 0.0, -0.0, 0.028169014084507043, 0.0, -0.0, 0.014084507042253521, 0.0, 0.0, 0.39436619718309857, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.7887323943661971, 0.11267605633802817, 0.2535211267605634, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.4084507042253521, -0.0, 0.04225352112676056, 0.014084507042253521, 0.014084507042253521, 0.0, 0.28169014084507044, 0.04225352112676056, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.5633802816901409, -0.0, -0.0, -0.0, -0.0, 0.0, 0.08450704225352113, -0.0, -0.0, -0.0, -0.0, 0.0, 0.30985915492957744, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.9295774647887324]
在这里几乎可以正确检测到所有峰,但是只有一个.
Here almost all peaks are detected correctly but one.
推荐答案
这是一个务实的解决方案,按照我的看法(如果我输入错了,请更正),您想在平滑"或"0期".
This is a pragmatic solution, as the way I see this (please correct me if I'm wrong) you want to find each peak after/before a 'smooth' or 0 period.
您可以通过简单地检查这些时间段并报告其开始和结束时间来完成此操作.
You can do this by simply checking for such periods and reporting their start and stop.
这是一个非常基本的实现,允许指定什么才是smooth
期限(我在这里使用小于0.001的变化作为条件):
Here is a very basic implementation, allowing to specify what qualifies as smooth
period (I use a change of less than 0.001 as condition here):
dy_lim = 0.001
targets = []
in_lock = False
i_l, d_l = 0, data[0]
for i, d in enumerate(data[1:]):
if abs(d_l - d) > dy_lim:
if in_lock:
targets.append(i_l)
targets.append(i + 1)
in_lock = False
i_l, d_l = i, d
else:
in_lock = True
然后绘制targets
:
plt.plot(range(len(data)), data)
plt.scatter(targets, [data[t] for t in targets], c='red')
plt.show()
没什么特别详细的说明,但是它找到了您指示的峰值.
Nothing very elaborated, but it finds the peak you indicated.
增加dy_lim
的值将使您发现更多的峰.另外,您可能希望指定一个平稳期的最小长度,这就是它的样子(再次只是一个粗略的实现):
Increasing the value of dy_lim
will let you find more peaks. Also you might want to specify a minimal length of what is a smooth period, here is how this might look like (again just a crude implementation):
dy_lim = 0.001
di_lim = 50
targets = []
in_lock = False
i_l, d_l = 0, data[0]
for i, d in enumerate(data[1:]):
if abs(d_l - d) > dy_lim:
if in_lock:
in_lock = False
if i - i_l > di_lim:
targets.append(i_l)
targets.append(i + 1)
i_l, d_l = i, d
else:
in_lock = True
使用此方法,您不会得到第一个点,因为第一和第二个之间的差大于di_lim=50
.
With this you would not get the first point as the difference between first and 2nd is bigger than di_lim=50
.
第二个数据集的更新:
这有点棘手,因为现在在峰值之后逐渐减少,导致差异缓慢聚集,足以达到dy_lim
,导致算法错误地报告新目标.因此,您需要测试该目标是否真的是一个峰值,然后只报告.
This gets a bit trickier, as now there are gradual decreases after a peak leading to a slow aggregation of difference, enough to hit the dy_lim
leading the algorithm to falsely report a new target. So you need to test whether this target really is a peak and only report then.
这是如何实现此目标的粗略实现:
Here is a crude implementation of how to achieve this:
dy_lim = 0.1
di_lim = 5
targets = []
in_lock = False
i_l, d_l = 0, data[0]
for i, d in enumerate(data[1:]):
if abs(d_l - d) > dy_lim:
if in_lock:
in_lock = False
if i - i_l > di_lim:
# here we check whether the start of the period was a peak
if abs(d_l - data[i_l]) > dy_lim:
# assure minimal distance if previous target exists
if targets:
if i_l - targets[-1] > di_lim:
targets.append(i_l)
else:
targets.append(i_l)
# and here whether the end is a peak
if abs(d - data[i]) > dy_lim:
targets.append(i + 1)
i_l, d_l = i, d
else:
in_lock = True
您最终将得到以下结果:
What you'll end up with is this:
一般说明:我们在这里采用一种自下而上的方法:您具有要检测的特定功能,因此您要编写特定的算法来实现此目的.
General Note: We are following a bottom-up approach here: You have a specific feature you want to detect, so you write a specific algorithm to do so.
这对于简单的任务可能非常有效,但是,我们已经在这个简单的示例中意识到,如果有新功能,算法应该能够应对,我们需要对其进行调整.如果当前的复杂性已足够,那么您就可以了.但是,如果数据还提供其他模式,那么您将再次处于需要添加更多条件的情况,并且由于算法需要处理额外的复杂性,因此该算法变得越来越复杂.如果最终遇到这种情况,则可能需要考虑换档并采用更真实的方法.在这种情况下,有很多选择,一种方法是使用 Savizky-Golay 过滤版本,但这只是一个可以在此处提出的建议之一.
This can be very effective for simple tasks, however, we realize already in this simple example that if there are new features the algorithm should be able to cope with we need to adapt it. If the current complexity is all there is, then you are fine. But if the data presents yet other patterns, then you'll be again in the situation where you need to add further conditions and the algorithm becomes more and more complicated as it needs to deal with the additional complexity. If you end up in such a situation then you might want to consider switching gears and adapt a more genuine approach. There are many options in this case, one way would be to work with the difference of the original data with a Savizky-Golay filtered version, but that's just one of many suggestions one could make here.
这篇关于智能峰值检测方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!