本文介绍了在时间序列中查找台阶(或尖峰)形状的pythonic方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pandas数据框(例如df),其中某些值突然跳跃(如阶跃或峰值)。识别它们的最佳方法是什么?

I have a pandas dataframe (e.g. df) in which some values are suddenly jumping (like a step or spike). What is the best way to identify them?

我编写了一个非常幼稚的代码,通过该代码可以计算出值的差以及几个下一个和上一个值。然后,通过比较这些值,程序将确定是步进还是峰值。

I have written a very naive code by which the differences of value with a couple of the next and previous values are calculated. Then by comparing those, the programs will decide whether is a step or spike.

# to create a dataframe
df=pd.DataFrame(np.random.randn(25), index=pd.date_range(start='2010-1-1',end='2010-1-2',freq='H'), columns=['value'])

# to manipulate the dataframe
df[10:11] = -0.933463
df[11:12] = 15
df[12:13] = 15
df[13:14] = 15

# to calculated the differnces of a value with a couple next and previous values
df_diff = pd.DataFrame()
df_diff['p1'] = df['value'].diff(periods=1).abs()
df_diff['p2'] = df['value'].diff(periods=2).abs()
df_diff['n1'] = df['value'].diff(periods=-1).abs()
df_diff['n2'] = df['value'].diff(periods=-2).abs()

max=5  # as an eligible maximum value
results =  (df_diff['n1'] >max) & (df_diff['n1'] == df_diff['n2']) & (df_diff['p1']==0)

我期望的是:

2010-01-01 00:00:00    False
2010-01-01 01:00:00    False
2010-01-01 02:00:00    False
2010-01-01 03:00:00    False
2010-01-01 04:00:00    False
2010-01-01 05:00:00    False
2010-01-01 06:00:00    False
2010-01-01 07:00:00    False
2010-01-01 08:00:00    False
2010-01-01 09:00:00    False
2010-01-01 10:00:00    True
2010-01-01 11:00:00    True
2010-01-01 12:00:00    True
2010-01-01 13:00:00    True
2010-01-01 14:00:00    True
2010-01-01 15:00:00    False
2010-01-01 16:00:00    False
2010-01-01 17:00:00    False
2010-01-01 18:00:00    False
2010-01-01 19:00:00    False
2010-01-01 20:00:00    False
2010-01-01 21:00:00    False
2010-01-01 22:00:00    False
2010-01-01 23:00:00    False
2010-01-02 00:00:00    False


推荐答案

Th您为下行峰值选择的值( df [10:11] = -0.933463 )太低,无法在没有更多信息的情况下将其与其他低点区分开。

The value you choose for the down peak (df[10:11] = -0.933463) is too low to differentiate it from the other lows without more information.

所以我将此值更改为-7。

So I changed this value to -7.

from scipy.signal import find_peaks
import pandas as pd
import numpy as np

# to create a dataframe
np.random.seed(42)
df=pd.DataFrame(np.random.randn(25), index=pd.date_range(start='2010-1-1',end='2010-1-2',freq='H'), columns=['value'])

# to manipulate the dataframe
df[10:11] = -7
df[11:12] = 15
df[12:13] = 15
df[13:14] = 15

peaks_up = find_peaks(df.value, prominence=4, plateau_size=1)
peaks_down = find_peaks(-df.value, prominence=4, plateau_size=1)

peaks_idx = np.unique(
    np.concatenate(
        [peaks_up[1]['left_edges'], peaks_up[0], peaks_up[1]['right_edges'],
         peaks_down[1]['left_edges'], peaks_down[0], peaks_down[1]['right_edges']],
        axis=0))
peaks_df = df.iloc[peaks_idx ]

要绘制:

import matplotlib.pyplot as plt
import seaborn as sns

sns.lineplot(df.index, df.value)
plt.scatter(peaks_df.index, peaks_df.value, color="red")

这篇关于在时间序列中查找台阶(或尖峰)形状的pythonic方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 11:47