问题描述
我有一个非常大的pyspark数据框,我取样了并将其转换为pandas数据框
I have a very large pyspark dataframe and I took a sample and convert it into pandas dataframe
sample = heavy_pivot.sample(False, fraction = 0.2, seed = None)
sample_pd = sample.toPandas()
数据框如下所示:
sample_pd[['client_id', 'beer_freq']].head(10)
client_id beer_freq
0 1000839 0.000000
1 1002185 0.000000
2 1003366 1.000000
3 1005218 1.000000
4 1005483 1.000000
5 100964 0.434783
6 101272 0.166667
7 1017462 0.000000
8 1020561 0.000000
9 1023646 0.000000
我想绘制列"beer_freq"
import matplotlib.pyplot as plt
matplotlib.pyplot.switch_backend('agg')
sample_pd.hist('beer_freq', bins = 100)
剧情没有出现...结果如下:
The plot did not show up...It gives results like this:
>>>array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f60f6fd0750>]], dtype=object)
似乎我无法使用matplotlib和pandas数据框编写一般的python代码在pyspark环境中绘制图形.
It seems like that I cannot write general python code using matplotlib and pandas dataframe to plot figures in pyspark environment.
如果我打plt.show()
什么都没发生...
If I call plt.show()
Nothing happens...
推荐答案
%matplotlib内联.您可以使用display()显示matplotlib图形.有关示例,请参见 https://docs.databricks.com/user-guide/visualizations/matplotlib-and-ggplot.html
%matplotlib inline is not supported in Databricks.You can display matplotlib figures using display(). For an example, see https://docs.databricks.com/user-guide/visualizations/matplotlib-and-ggplot.html
这篇关于如何在pyspark环境中使用matplotlib和pandas进行绘图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!