在Python中绘制一系列时间序列的流动持续时间曲线

本文介绍了在Python中绘制一系列时间序列的流动持续时间曲线的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

持续时间曲线是水文学(和其他领域)可视化时间序列的一种常用方法.它们允许轻松评估时间序列中的高值和低值以及达到某些值的频率.Python中有没有一种简单的方法来绘制它?我找不到任何允许它的 matplotlib 工具.似乎也没有其他软件包包含它，至少不能轻松绘制一系列流动持续时间曲线.

流量持续时间曲线的示例如下:

可以在此处找到有关如何创建它的一般说明:

在这里您可以看到三个不同的曲线.黑线是河流的测量值，而两个阴影区域是这两个模型的所有模型运行的范围.那么，计算和绘制多个时间序列的一系列流动持续时间曲线的最简单方法是什么?

解决方案

由于我的第一个答案过于复杂和不优雅，我重新编写了它以包含 ImportanceOfBeingErnest 的解决方案.我仍然在这里保留新版本，以及 ImportanceOfBeingErnest 的版本，因为我认为附加功能可能会让其他人更容易为他们的时间序列绘制流动持续时间曲线.如果有人可能有其他想法，请参阅:

Flow duration curves are a common way in hydrology (and other fields) to visualize timeseries. They allow an easy assessment of the high and low values in a timeseries and how often certain values are reached. Is there an easy way in Python to plot it? I could not find any matplotlib tools, which would allow it. Also no other package seems to include it, at least not with the possibility to plot a range of flow duration curves easily.

An example for a flow duration curve would be:

An explantion on how to create it in general can be found here:http://www.renewablesfirst.co.uk/hydropower/hydropower-learning-centre/what-is-a-flow-duration-curve/

So the basic calculation and plotting of the flow duration curve are pretty straightforward. Simply calculate the exceedence and plot it against the sorted timeseries (see the answer of ImportanceOfBeingErnest). It gets more difficult though if you have several timeseries and want to plot the range of the values for all exceedence probabilities. I present one solution in my answer to this thread, but would be glad to hear more elegant solutions. My solution also incorporates an easy use as a subplot, as it is common to have several timeseries for different locations, that have to be plotted seperately.

An example for what I mean with range of flow duration curves would be this:

Here you can see three distinct curves. The black line is the measured value from a river, while the two shaded areas are the range for all model runs of those two models. So what would be the most easy way to calculate and plot a range of flow duration curves for several timeseries?

解决方案

EDIT: As my first answer was overly complicated and unelegant, I rewrote it to incorporate the solutions by ImportanceOfBeingErnest. I still keep the new version here, alongside the one by ImportanceOfBeingErnest, because I think the additional functionality might make it easier for other people to plot flow duration curves for their timeseries. If someone might have additional ideas see: Github Repository

Features are:

Changing the percentiles for a range flow duration curve
Easy usage as standalone figure or subplot. If an subplot object is provided the flow duration curve is drawn in this one. When None is provided it creates one and returns it
Seperate kwargs for the range curve and its comparison
Changing the y-axis to logarithmic scale with a keyword
Extended example to help understand its usage.

The code is the following:

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 15 10:09:13 2018

@author: Florian Ulrich Jehn
"""
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


def flow_duration_curve(x, comparison=None, axis=0, ax=None, plot=True, 
                        log=True, percentiles=(5, 95), decimal_places=1,
                        fdc_kwargs=None, fdc_range_kwargs=None, 
                        fdc_comparison_kwargs=None):
    """
    Calculates and plots a flow duration curve from x. 

    All observations/simulations are ordered and the empirical probability is
    calculated. This is then plotted as a flow duration curve. 

    When x has more than one dimension along axis, a range flow duration curve 
    is plotted. This means that for every probability a min and max flow is 
    determined. This is then plotted as a fill between. 

    Additionally a comparison can be given to the function, which is plotted in
    the same ax.

    :param x: numpy array or pandas dataframe, discharge of measurements or 
    simulations
    :param comparison: numpy array or pandas dataframe of discharge that should
    also be plotted in the same ax
    :param axis: int, axis along which x is iterated through
    :param ax: matplotlib subplot object, if not None, will plot in that 
    instance
    :param plot: bool, if False function will not show the plot, but simply
    return the ax object
    :param log: bool, if True plot on loglog axis
    :param percentiles: tuple of int, percentiles that should be used for 
    drawing a range flow duration curve
    :param fdc_kwargs: dict, matplotlib keywords for the normal fdc
    :param fdc_range_kwargs: dict, matplotlib keywords for the range fdc
    :param fdc_comparison_kwargs: dict, matplotlib keywords for the comparison 
    fdc

    return: subplot object with the flow duration curve in it
    """
    # Convert x to an pandas dataframe, for easier handling
    if not isinstance(x, pd.DataFrame):
        x = pd.DataFrame(x)

    # Get the dataframe in the right dimensions, if it is not in the expected
    if axis != 0:
        x = x.transpose()

    # Convert comparison to a dataframe as well
    if comparison is not None and not isinstance(comparison, pd.DataFrame):
        comparison = pd.DataFrame(comparison)
        # And transpose it is neccesary
        if axis != 0:
            comparison = comparison.transpose()

    # Create an ax is neccesary
    if ax is None:
        fig, ax = plt.subplots(1,1)

    # Make the y scale logarithmic if needed
    if log:
        ax.set_yscale("log")

    # Determine if it is a range flow curve or a normal one by checking the 
    # dimensions of the dataframe
    # If it is one, make a single fdc
    if x.shape[1] == 1:
        plot_single_flow_duration_curve(ax, x[0], fdc_kwargs)   

    # Make a range flow duration curve
    else:
        plot_range_flow_duration_curve(ax, x, percentiles, fdc_range_kwargs)

    # Add a comparison to the plot if is present
    if comparison is not None:
        ax = plot_single_flow_duration_curve(ax, comparison[0], 
                                             fdc_comparison_kwargs)    

    # Name the x-axis
    ax.set_xlabel("Exceedence [%]")

    # show if requested
    if plot:
        plt.show()

    return ax


def plot_single_flow_duration_curve(ax, timeseries, kwargs):
    """
    Plots a single fdc into an ax.

    :param ax: matplotlib subplot object
    :param timeseries: list like iterable
    :param kwargs: dict, keyword arguments for matplotlib

    return: subplot object with a flow duration curve drawn into it
    """
    # Get the probability
    exceedence = np.arange(1., len(timeseries) + 1) / len(timeseries)
    exceedence *= 100
    # Plot the curve, check for empty kwargs
    if kwargs is not None:
        ax.plot(exceedence, sorted(timeseries, reverse=True), **kwargs)
    else:
        ax.plot(exceedence, sorted(timeseries, reverse=True))
    return ax


def plot_range_flow_duration_curve(ax, x, percentiles, kwargs):
    """
    Plots a single range fdc into an ax.

    :param ax: matplotlib subplot object
    :param x: dataframe of several timeseries
    :param decimal_places: defines how finely grained the range flow duration 
    curve is calculated and drawn. A low values makes it more finely grained.
    A value which is too low might create artefacts.
    :param kwargs: dict, keyword arguments for matplotlib

    return: subplot object with a range flow duration curve drawn into it
    """
    # Get the probabilites
    exceedence = np.arange(1.,len(np.array(x))+1) /len(np.array(x))
    exceedence *= 100

    # Sort the data
    sort = np.sort(x, axis=0)[::-1]

    # Get the percentiles
    low_percentile = np.percentile(sort, percentiles[0], axis=1)
    high_percentile = np.percentile(sort, percentiles[1], axis=1)

    # Plot it, check for empty kwargs
    if kwargs is not None:
        ax.fill_between(exceedence, low_percentile, high_percentile, **kwargs)
    else:
        ax.fill_between(exceedence, low_percentile, high_percentile)
    return ax

How to use it:

# Create test data
np_array_one_dim = np.random.rayleigh(5, [1, 300])
np_array_75_dim = np.c_[np.random.rayleigh(11 ,[25, 300]),
                        np.random.rayleigh(10, [25, 300]),
                        np.random.rayleigh(8, [25, 300])]
df_one_dim = pd.DataFrame(np.random.rayleigh(9, [1, 300]))
df_75_dim = pd.DataFrame(np.c_[np.random.rayleigh(8, [25, 300]),
                               np.random.rayleigh(15, [25, 300]),
                               np.random.rayleigh(3, [25, 300])])
df_75_dim_transposed = pd.DataFrame(np_array_75_dim.transpose())

# Call the function with all different arguments
fig, subplots = plt.subplots(nrows=2, ncols=3)
ax1 = flow_duration_curve(np_array_one_dim, ax=subplots[0,0], plot=False,
                          axis=1, fdc_kwargs={"linewidth":0.5})
ax1.set_title("np array one dim\nwith kwargs")

ax2 = flow_duration_curve(np_array_75_dim, ax=subplots[0,1], plot=False,
                          axis=1, log=False, percentiles=(0,100))
ax2.set_title("np array 75 dim\nchanged percentiles\nnolog")

ax3 = flow_duration_curve(df_one_dim, ax=subplots[0,2], plot=False, axis=1,
                          log=False, fdc_kwargs={"linewidth":0.5})
ax3.set_title("\ndf one dim\nno log\nwith kwargs")

ax4 = flow_duration_curve(df_75_dim, ax=subplots[1,0], plot=False, axis=1,
                          log=False)
ax4.set_title("df 75 dim\nno log")

ax5 = flow_duration_curve(df_75_dim_transposed, ax=subplots[1,1], 
                          plot=False)
ax5.set_title("df 75 dim transposed")

ax6 = flow_duration_curve(df_75_dim, ax=subplots[1,2], plot=False,
                          comparison=np_array_one_dim, axis=1, 
                          fdc_comparison_kwargs={"color":"black", 
                                                 "label":"comparison",
                                                 "linewidth":0.5},
                          fdc_range_kwargs={"label":"range_fdc"})
ax6.set_title("df 75 dim\n with comparison\nwith kwargs")
ax6.legend()

# Show the beauty
fig.tight_layout()
plt.show()

The results look like this:

这篇关于在Python中绘制一系列时间序列的流动持续时间曲线的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！