问题描述
持续时间曲线是水文学(和其他领域)可视化时间序列的一种常用方法.它们允许轻松评估时间序列中的高值和低值以及达到某些值的频率.Python中有没有一种简单的方法来绘制它?我找不到任何允许它的 matplotlib 工具.似乎也没有其他软件包包含它,至少不能轻松绘制一系列流动持续时间曲线.流量持续时间曲线的示例如下:
可以在此处找到有关如何创建它的一般说明:
在这里您可以看到三个不同的曲线.黑线是河流的测量值,而两个阴影区域是这两个模型的所有模型运行的范围.那么,计算和绘制多个时间序列的一系列流动持续时间曲线的最简单方法是什么?
由于我的第一个答案过于复杂和不优雅,我重新编写了它以包含 ImportanceOfBeingErnest 的解决方案.我仍然在这里保留新版本,以及 ImportanceOfBeingErnest 的版本,因为我认为附加功能可能会让其他人更容易为他们的时间序列绘制流动持续时间曲线.如果有人可能有其他想法,请参阅:
Flow duration curves are a common way in hydrology (and other fields) to visualize timeseries. They allow an easy assessment of the high and low values in a timeseries and how often certain values are reached. Is there an easy way in Python to plot it? I could not find any matplotlib tools, which would allow it. Also no other package seems to include it, at least not with the possibility to plot a range of flow duration curves easily.
An example for a flow duration curve would be:
An explantion on how to create it in general can be found here:http://www.renewablesfirst.co.uk/hydropower/hydropower-learning-centre/what-is-a-flow-duration-curve/
So the basic calculation and plotting of the flow duration curve are pretty straightforward. Simply calculate the exceedence and plot it against the sorted timeseries (see the answer of ImportanceOfBeingErnest). It gets more difficult though if you have several timeseries and want to plot the range of the values for all exceedence probabilities. I present one solution in my answer to this thread, but would be glad to hear more elegant solutions. My solution also incorporates an easy use as a subplot, as it is common to have several timeseries for different locations, that have to be plotted seperately.
An example for what I mean with range of flow duration curves would be this:
Here you can see three distinct curves. The black line is the measured value from a river, while the two shaded areas are the range for all model runs of those two models. So what would be the most easy way to calculate and plot a range of flow duration curves for several timeseries?
EDIT: As my first answer was overly complicated and unelegant, I rewrote it to incorporate the solutions by ImportanceOfBeingErnest. I still keep the new version here, alongside the one by ImportanceOfBeingErnest, because I think the additional functionality might make it easier for other people to plot flow duration curves for their timeseries. If someone might have additional ideas see: Github Repository
Features are:
Changing the percentiles for a range flow duration curve
Easy usage as standalone figure or subplot. If an subplot object is provided the flow duration curve is drawn in this one. When None is provided it creates one and returns it
Seperate kwargs for the range curve and its comparison
Changing the y-axis to logarithmic scale with a keyword
Extended example to help understand its usage.
The code is the following:
# -*- coding: utf-8 -*-
"""
Created on Thu Mar 15 10:09:13 2018
@author: Florian Ulrich Jehn
"""
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def flow_duration_curve(x, comparison=None, axis=0, ax=None, plot=True,
log=True, percentiles=(5, 95), decimal_places=1,
fdc_kwargs=None, fdc_range_kwargs=None,
fdc_comparison_kwargs=None):
"""
Calculates and plots a flow duration curve from x.
All observations/simulations are ordered and the empirical probability is
calculated. This is then plotted as a flow duration curve.
When x has more than one dimension along axis, a range flow duration curve
is plotted. This means that for every probability a min and max flow is
determined. This is then plotted as a fill between.
Additionally a comparison can be given to the function, which is plotted in
the same ax.
:param x: numpy array or pandas dataframe, discharge of measurements or
simulations
:param comparison: numpy array or pandas dataframe of discharge that should
also be plotted in the same ax
:param axis: int, axis along which x is iterated through
:param ax: matplotlib subplot object, if not None, will plot in that
instance
:param plot: bool, if False function will not show the plot, but simply
return the ax object
:param log: bool, if True plot on loglog axis
:param percentiles: tuple of int, percentiles that should be used for
drawing a range flow duration curve
:param fdc_kwargs: dict, matplotlib keywords for the normal fdc
:param fdc_range_kwargs: dict, matplotlib keywords for the range fdc
:param fdc_comparison_kwargs: dict, matplotlib keywords for the comparison
fdc
return: subplot object with the flow duration curve in it
"""
# Convert x to an pandas dataframe, for easier handling
if not isinstance(x, pd.DataFrame):
x = pd.DataFrame(x)
# Get the dataframe in the right dimensions, if it is not in the expected
if axis != 0:
x = x.transpose()
# Convert comparison to a dataframe as well
if comparison is not None and not isinstance(comparison, pd.DataFrame):
comparison = pd.DataFrame(comparison)
# And transpose it is neccesary
if axis != 0:
comparison = comparison.transpose()
# Create an ax is neccesary
if ax is None:
fig, ax = plt.subplots(1,1)
# Make the y scale logarithmic if needed
if log:
ax.set_yscale("log")
# Determine if it is a range flow curve or a normal one by checking the
# dimensions of the dataframe
# If it is one, make a single fdc
if x.shape[1] == 1:
plot_single_flow_duration_curve(ax, x[0], fdc_kwargs)
# Make a range flow duration curve
else:
plot_range_flow_duration_curve(ax, x, percentiles, fdc_range_kwargs)
# Add a comparison to the plot if is present
if comparison is not None:
ax = plot_single_flow_duration_curve(ax, comparison[0],
fdc_comparison_kwargs)
# Name the x-axis
ax.set_xlabel("Exceedence [%]")
# show if requested
if plot:
plt.show()
return ax
def plot_single_flow_duration_curve(ax, timeseries, kwargs):
"""
Plots a single fdc into an ax.
:param ax: matplotlib subplot object
:param timeseries: list like iterable
:param kwargs: dict, keyword arguments for matplotlib
return: subplot object with a flow duration curve drawn into it
"""
# Get the probability
exceedence = np.arange(1., len(timeseries) + 1) / len(timeseries)
exceedence *= 100
# Plot the curve, check for empty kwargs
if kwargs is not None:
ax.plot(exceedence, sorted(timeseries, reverse=True), **kwargs)
else:
ax.plot(exceedence, sorted(timeseries, reverse=True))
return ax
def plot_range_flow_duration_curve(ax, x, percentiles, kwargs):
"""
Plots a single range fdc into an ax.
:param ax: matplotlib subplot object
:param x: dataframe of several timeseries
:param decimal_places: defines how finely grained the range flow duration
curve is calculated and drawn. A low values makes it more finely grained.
A value which is too low might create artefacts.
:param kwargs: dict, keyword arguments for matplotlib
return: subplot object with a range flow duration curve drawn into it
"""
# Get the probabilites
exceedence = np.arange(1.,len(np.array(x))+1) /len(np.array(x))
exceedence *= 100
# Sort the data
sort = np.sort(x, axis=0)[::-1]
# Get the percentiles
low_percentile = np.percentile(sort, percentiles[0], axis=1)
high_percentile = np.percentile(sort, percentiles[1], axis=1)
# Plot it, check for empty kwargs
if kwargs is not None:
ax.fill_between(exceedence, low_percentile, high_percentile, **kwargs)
else:
ax.fill_between(exceedence, low_percentile, high_percentile)
return ax
How to use it:
# Create test data
np_array_one_dim = np.random.rayleigh(5, [1, 300])
np_array_75_dim = np.c_[np.random.rayleigh(11 ,[25, 300]),
np.random.rayleigh(10, [25, 300]),
np.random.rayleigh(8, [25, 300])]
df_one_dim = pd.DataFrame(np.random.rayleigh(9, [1, 300]))
df_75_dim = pd.DataFrame(np.c_[np.random.rayleigh(8, [25, 300]),
np.random.rayleigh(15, [25, 300]),
np.random.rayleigh(3, [25, 300])])
df_75_dim_transposed = pd.DataFrame(np_array_75_dim.transpose())
# Call the function with all different arguments
fig, subplots = plt.subplots(nrows=2, ncols=3)
ax1 = flow_duration_curve(np_array_one_dim, ax=subplots[0,0], plot=False,
axis=1, fdc_kwargs={"linewidth":0.5})
ax1.set_title("np array one dim\nwith kwargs")
ax2 = flow_duration_curve(np_array_75_dim, ax=subplots[0,1], plot=False,
axis=1, log=False, percentiles=(0,100))
ax2.set_title("np array 75 dim\nchanged percentiles\nnolog")
ax3 = flow_duration_curve(df_one_dim, ax=subplots[0,2], plot=False, axis=1,
log=False, fdc_kwargs={"linewidth":0.5})
ax3.set_title("\ndf one dim\nno log\nwith kwargs")
ax4 = flow_duration_curve(df_75_dim, ax=subplots[1,0], plot=False, axis=1,
log=False)
ax4.set_title("df 75 dim\nno log")
ax5 = flow_duration_curve(df_75_dim_transposed, ax=subplots[1,1],
plot=False)
ax5.set_title("df 75 dim transposed")
ax6 = flow_duration_curve(df_75_dim, ax=subplots[1,2], plot=False,
comparison=np_array_one_dim, axis=1,
fdc_comparison_kwargs={"color":"black",
"label":"comparison",
"linewidth":0.5},
fdc_range_kwargs={"label":"range_fdc"})
ax6.set_title("df 75 dim\n with comparison\nwith kwargs")
ax6.legend()
# Show the beauty
fig.tight_layout()
plt.show()
The results look like this:
这篇关于在Python中绘制一系列时间序列的流动持续时间曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!