pyspark环境中使用matplotlib和pandas进行绘

pyspark环境中使用matplotlib和pandas进行绘

本文介绍了如何在pyspark环境中使用matplotlib和pandas进行绘图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的pyspark数据框,我取样了并将其转换为pandas数据框

I have a very large pyspark dataframe and I took a sample and convert it into pandas dataframe

sample = heavy_pivot.sample(False, fraction = 0.2, seed = None)
sample_pd = sample.toPandas()

数据框如下所示:

sample_pd[['client_id', 'beer_freq']].head(10)


  client_id  beer_freq
0   1000839   0.000000
1   1002185   0.000000
2   1003366   1.000000
3   1005218   1.000000
4   1005483   1.000000
5    100964   0.434783
6    101272   0.166667
7   1017462   0.000000
8   1020561   0.000000
9   1023646   0.000000

我想绘制列"beer_freq"

import matplotlib.pyplot as plt
matplotlib.pyplot.switch_backend('agg')

sample_pd.hist('beer_freq', bins = 100)

剧情没有出现...结果如下:

The plot did not show up...It gives results like this:

 >>>array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f60f6fd0750>]], dtype=object)

似乎我无法使用matplotlib和pandas数据框编写一般的python代码在pyspark环境中绘制图形.

It seems like that I cannot write general python code using matplotlib and pandas dataframe to plot figures in pyspark environment.

如果我打plt.show()什么都没发生...

If I call plt.show() Nothing happens...

推荐答案

%matplotlib内联.您可以使用display()显示matplotlib图形.有关示例,请参见 https://docs.databricks.com/user-guide/visualizations/matplotlib-and-ggplot.html

%matplotlib inline is not supported in Databricks.You can display matplotlib figures using display(). For an example, see https://docs.databricks.com/user-guide/visualizations/matplotlib-and-ggplot.html

这篇关于如何在pyspark环境中使用matplotlib和pandas进行绘图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 16:25