问题描述
我正在编写代码,在其中使用内部库和%pyspark解释器来获取数据集. 但是我无法将数据集传递给%python解释器.我尝试使用字符串变量,它可以正常工作,但是对于数据集,我正在使用以下代码将数据集放入齐柏林飞艇上下文中-z.put("input_data",input_data)
I'm writing a code where I'm fetching a dataset using an internal library and %pyspark interpreter. However I am unable to pass the dataset to %python interpreter. I tried using string variables and it is working fine, but with dataset I'm using the following code to put dataset in a zeppelin context- z.put("input_data",input_data)
并引发以下错误:
AttributeError: 'DataFrame' object has no attribute '_get_object_id'
.
您能告诉我我该怎么做吗?预先感谢.
Can you please tell me how can I do this? Thanks in advance.
推荐答案
您可以通过将结果打印到%table来将结果放入ResourcePool.
You can put the result in ResourcePool via print it to %table.
%python
print('%table a\tb\n408+\t+408\n0001\t++99\n40817810300001453030\t0000040817810300001453030')
然后以这种方式获得.
%spark.pyspark
ic = z.getInterpreterContext()
pool = ic.getResourcePool()
paragraphId = "20180828-093109_1491500809"
t = pool.get(ic.getNoteId(), paragraphId, "zeppelin.paragraph.result.table").get().toString()
print(t)
这种方式最多可以传输50-100兆字节的原始数据.
This way allows to transfer up to 50-100 megabytes of raw data.
无论如何,我建议遵循@zjffdu仅使用这些解释器之一.
Anyway I recommend to follow @zjffdu to use only one of these interpreters.
这篇关于如何在Zeppelin中的%pyspark解释器和%python解释器之间传递数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!