我正在尝试获取时间序列数据的条形图,类似于以下示例:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource,FactorRange
from bokeh.palettes import Spectral6
from bokeh.plotting import figure
output_file("bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples",
"2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an
hstack
source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
这是我的数据:
import pandas as pd
import numpy as np
dates = pd.date_range('20190101', periods=100)
dfr = pd.DataFrame(np.random.randn(100, 6), index=dates,
columns=list('ABCDEF'))
dfr=dfr.resample('M').sum()
我无法弄清楚如何将dfr转换为字典,以便获得类似于工作示例的条形图。提前致谢。请提出前进的方向。
最佳答案
您需要通过stack
为Series
重塑DataFrame,然后将第一级MultiIndex
转换为格式为YYYY-MM-DD
的字符串,然后传递给字典:
output_file("bars.html")
dates = pd.date_range('20190101', periods=100)
dfr = pd.DataFrame(np.random.randn(100, 6), index=dates, columns=list('ABCDEF'))
s = dfr.resample('M').sum().stack()
s.index = [s.index.get_level_values(0).strftime('%Y-%m-%d'),
s.index.get_level_values(1)]
x = s.index.values
print (x)
[('2019-01-31', 'A') ('2019-01-31', 'B') ('2019-01-31', 'C')
('2019-01-31', 'D') ('2019-01-31', 'E') ('2019-01-31', 'F')
('2019-02-28', 'A') ('2019-02-28', 'B') ('2019-02-28', 'C')
('2019-02-28', 'D') ('2019-02-28', 'E') ('2019-02-28', 'F')
('2019-03-31', 'A') ('2019-03-31', 'B') ('2019-03-31', 'C')
('2019-03-31', 'D') ('2019-03-31', 'E') ('2019-03-31', 'F')
('2019-04-30', 'A') ('2019-04-30', 'B') ('2019-04-30', 'C')
('2019-04-30', 'D') ('2019-04-30', 'E') ('2019-04-30', 'F')]
counts = s.values
print (counts)
[ 5.8759305 -7.52857928 2.74794675 9.91942791 1.49860961 0.16046735
0.15459667 3.86407105 0.79097565 -2.65899131 1.86548175 1.41251127
-3.67053891 13.90439142 2.80744458 2.51583516 -2.37587758 4.49826959
-0.7661524 -6.22533991 5.90391326 4.40654035 1.93598738 2.49407506]
source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), plot_height=250, title="Sums by Months",
toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)