如何在pyspark环境中使用matplotlib和pandas进行绘图?

本文介绍了如何在pyspark环境中使用matplotlib和pandas进行绘图?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常大的pyspark数据框，我取样了并将其转换为pandas数据框

I have a very large pyspark dataframe and I took a sample and convert it into pandas dataframe

sample = heavy_pivot.sample(False, fraction = 0.2, seed = None)
sample_pd = sample.toPandas()

数据框如下所示:

sample_pd[['client_id', 'beer_freq']].head(10)


  client_id  beer_freq
0   1000839   0.000000
1   1002185   0.000000
2   1003366   1.000000
3   1005218   1.000000
4   1005483   1.000000
5    100964   0.434783
6    101272   0.166667
7   1017462   0.000000
8   1020561   0.000000
9   1023646   0.000000

我想绘制列"beer_freq"

import matplotlib.pyplot as plt
matplotlib.pyplot.switch_backend('agg')

sample_pd.hist('beer_freq', bins = 100)

剧情没有出现...结果如下:

The plot did not show up...It gives results like this:

 >>>array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f60f6fd0750>]], dtype=object)

似乎我无法使用matplotlib和pandas数据框编写一般的python代码在pyspark环境中绘制图形.

It seems like that I cannot write general python code using matplotlib and pandas dataframe to plot figures in pyspark environment.

如果我打plt.show()什么都没发生...

If I call plt.show() Nothing happens...

pyspark环境中使用matplotlib和pandas进行绘

问题描述

推荐答案