数据帧转换为Spark数据帧

数据帧转换为Spark数据帧

本文介绍了在Zeppelin中将 pandas 数据帧转换为Spark数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是齐柏林飞艇的新手.我有一个用例,其中有一个熊猫数据框.我需要使用齐柏林飞艇的内置图表来可视化集合,我这里没有一个明确的方法.我对齐柏林飞艇的理解是,如果它是RDD格式,我们可以可视化数据.所以,我想将pandas数据框转换为spark数据框,然后进行一些查询(使用sql),我将进行可视化.首先,我尝试将pandas数据框转换为spark,但是我失败了

I am new to zeppelin. I have a usecase wherein i have a pandas dataframe.I need to visualize the collections using in-built chart of zeppelin I do not have a clear approach here. MY understanding is with zeppelin we can visualize the data if it is a RDD format. So, i wanted to convert to pandas dataframe into spark dataframe, and then do some querying (using sql), I will visualize.To start with, I tried to convert pandas dataframe to spark's but i failed

%pyspark
import pandas as pd
from pyspark.sql import SQLContext
print sc
df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
print type(df)
print df
sqlCtx = SQLContext(sc)
sqlCtx.createDataFrame(df).show()

我得到了以下错误

Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py",
line 162, in <module> eval(compiledCode) File "<string>",
line 8, in <module> File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py",
line 406, in createDataFrame rdd, schema = self._createFromLocal(data, schema) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py",
line 322, in _createFromLocal struct = self._inferSchemaFromList(data) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/context.py",
line 211, in _inferSchemaFromList schema = _infer_schema(first) File "/home/bala/Software/spark-1.5.0-bin-hadoop2.6/python/pyspark/sql/types.py",
line 829, in _infer_schema raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <type 'str'>

有人可以在这里帮助我吗?另外,如果我在任何地方错了,请纠正我.

Can someone please help me out here? Also, correct me if I am wrong anywhere.

推荐答案

以下内容适用于Zeppelin 0.6.0,Spark 1.6.2和Python 3.5.2:

The following works for me with Zeppelin 0.6.0, Spark 1.6.2 and Python 3.5.2:

%pyspark
import pandas as pd
df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
z.show(sqlContext.createDataFrame(df))

呈现为:

这篇关于在Zeppelin中将 pandas 数据帧转换为Spark数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 04:49