问题描述
我正在尝试从具有多个序列的数据框中绘制seaborn的时间序列图.
I'm trying to make a time series plot with seaborn from a dataframe that has multiple series.
来自此帖子:来自熊猫数据帧的新生儿时间序列
我认为tsplot无法正常工作,因为它意在绘制不确定性.
I gather that tsplot isn't going to work as it is meant to plot uncertainty.
那么还有另一种Seaborn方法适用于具有多个系列的折线图吗?
So is there another Seaborn method that is meant for line charts with multiple series?
我的数据框如下:
print(df.info())
print(df.describe())
print(df.values)
print(df.index)
输出:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2013-01-03 to 2014-01-03
Data columns (total 5 columns):
Equity(24 [AAPL]) 253 non-null float64
Equity(3766 [IBM]) 253 non-null float64
Equity(5061 [MSFT]) 253 non-null float64
Equity(6683 [SBUX]) 253 non-null float64
Equity(8554 [SPY]) 253 non-null float64
dtypes: float64(5)
memory usage: 11.9 KB
None
Equity(24 [AAPL]) Equity(3766 [IBM]) Equity(5061 [MSFT]) \
count 253.000000 253.000000 253.000000
mean 67.560593 194.075383 32.547436
std 6.435356 11.175226 3.457613
min 55.811000 172.820000 26.480000
25% 62.538000 184.690000 28.680000
50% 65.877000 193.880000 33.030000
75% 72.299000 203.490000 34.990000
max 81.463000 215.780000 38.970000
Equity(6683 [SBUX]) Equity(8554 [SPY])
count 253.000000 253.000000
mean 33.773277 164.690180
std 4.597291 10.038221
min 26.610000 145.540000
25% 29.085000 156.130000
50% 33.650000 165.310000
75% 38.280000 170.310000
max 40.995000 184.560000
[[ 77.484 195.24 27.28 27.685 145.77 ]
[ 75.289 193.989 26.76 27.85 146.38 ]
[ 74.854 193.2 26.71 27.875 145.965]
...,
[ 80.167 187.51 37.43 39.195 184.56 ]
[ 79.034 185.52 37.145 38.595 182.95 ]
[ 77.284 186.66 36.92 38.475 182.8 ]]
DatetimeIndex(['2013-01-03', '2013-01-04', '2013-01-07', '2013-01-08',
'2013-01-09', '2013-01-10', '2013-01-11', '2013-01-14',
'2013-01-15', '2013-01-16',
...
'2013-12-19', '2013-12-20', '2013-12-23', '2013-12-24',
'2013-12-26', '2013-12-27', '2013-12-30', '2013-12-31',
'2014-01-02', '2014-01-03'],
dtype='datetime64[ns]', length=253, freq=None, tz='UTC')
这可行(但是我想让Seaborn弄脏我的手):
This works (but I want to get my hands dirty with Seaborn):
df.plot()
输出:
谢谢您的时间!
Update1:
Update1:
df.to_dict()
返回: https://gist.github.com/anonymous/2bdc1ce0f9d0b6ccd6675ab4f7313a5f
Update2:
使用@knagaev示例代码,我将其缩小为这种差异:
Using @knagaev sample code, I've narrowed it down to this difference:
当前数据帧(print(current_df)
的输出):
current dataframe (output of print(current_df)
):
Equity(24 [AAPL]) Equity(3766 [IBM]) \
2013-01-03 00:00:00+00:00 77.484 195.2400
2013-01-04 00:00:00+00:00 75.289 193.9890
2013-01-07 00:00:00+00:00 74.854 193.2000
2013-01-08 00:00:00+00:00 75.029 192.8200
2013-01-09 00:00:00+00:00 73.873 192.3800
所需的数据帧(print(desired_df)
的输出):
desired dataframe (output of print(desired_df)
):
Date Company Kind Price
0 2014-01-02 IBM Open 187.210007
1 2014-01-02 IBM High 187.399994
2 2014-01-02 IBM Low 185.199997
3 2014-01-02 IBM Close 185.529999
4 2014-01-02 IBM Volume 4546500.000000
5 2014-01-02 IBM Adj Close 171.971090
6 2014-01-02 MSFT Open 37.349998
7 2014-01-02 MSFT High 37.400002
8 2014-01-02 MSFT Low 37.099998
9 2014-01-02 MSFT Close 37.160000
10 2014-01-02 MSFT Volume 30632200.000000
11 2014-01-02 MSFT Adj Close 34.960000
12 2014-01-02 ORCL Open 37.779999
13 2014-01-02 ORCL High 38.029999
14 2014-01-02 ORCL Low 37.549999
15 2014-01-02 ORCL Close 37.840000
16 2014-01-02 ORCL Volume 18162100.000000
将current_df
重组为desired_df
的最佳方法是什么?
What's the best way to reorganize the current_df
to desired_df
?
更新3:我终于在@knagaev的帮助下使其工作了:
Update 3:I finally got it working from the help of @knagaev:
我必须添加一个虚拟列以及优化索引:
I had to add a dummy column as well as finesse the index:
df['Datetime'] = df.index
melted_df = pd.melt(df, id_vars='Datetime', var_name='Security', value_name='Price')
melted_df['Dummy'] = 0
sns.tsplot(melted_df, time='Datetime', unit='Dummy', condition='Security', value='Price', ax=ax)
产生:
推荐答案
您可以尝试使用 tsplot .
您将绘制带有标准错误(统计添加项")的折线图
You will draw your line charts with standard errors ("statistical additions")
我试图模拟您的数据集.这是结果
I tried to simulate your dataset. So here is the results
import pandas.io.data as web
from datetime import datetime
import seaborn as sns
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
start = datetime(2014,1,1)
end = datetime(2014,3,28)
f = web.DataReader(stocks, 'yahoo',start,end)
df = pd.DataFrame(f.to_frame().stack()).reset_index()
df.columns = ['Date', 'Company', 'Kind', 'Price']
sns.tsplot(df, time='Date', unit='Kind', condition='Company', value='Price')
通过这种方式,该示例非常模仿.参数单位"是数据数据帧中标识采样单位(例如,受试者,神经元等)的字段.在每次观察时间/条件时,错误表示将在单位上折叠."(来自文档).因此,我将种类"字段用于说明目的.
By the way this sample is very imitative. The parameter "unit" is "Field in the data DataFrame identifying the sampling unit (e.g. subject, neuron, etc.). The error representation will collapse over units at each time/condition observation. " (from documentation). So I used the 'Kind' field for illustrative purposes.
好的,我为您的数据框做了一个例子.它具有用于噪声清除"的伪字段:)
Ok, I made an example for your dataframe.It has dummy field for "noise cleaning" :)
import pandas.io.data as web
from datetime import datetime
import seaborn as sns
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
start = datetime(2010,1,1)
end = datetime(2015,12,31)
f = web.DataReader(stocks, 'yahoo',start,end)
df = pd.DataFrame(f.to_frame().stack()).reset_index()
df.columns = ['Date', 'Company', 'Kind', 'Price']
df_open = df[df['Kind'] == 'Open'].copy()
df_open['Dummy'] = 0
sns.tsplot(df_open, time='Date', unit='Dummy', condition='Company', value='Price')
P.S.感谢@VanPeer-现在,您可以使用 seaborn.lineplot 对于这个问题
P.S. Thanks to @VanPeer - now you can use seaborn.lineplot for this problem
这篇关于Seaborn时间序列图与多个序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!