本文介绍了如何在HDInsight Spark/Jupyter上使用Avro?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试读取HDInsight Spark/Jupyter群集中的avro文件,但得到了
I am trying to read in a avro file inside HDInsight Spark/Jupyter cluster but got
u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;'
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py", line 159, in load
return self._df(self._jreader.load(path))
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;'
df = spark.read.format("com.databricks.spark.avro").load("wasb://[email protected]/...")
我该如何解决?似乎需要安装该软件包,但是如何在HDInsight上安装它?
How do I resolve this? It seems like I need to install the package but how can I do it on HDInsight?
推荐答案
您只需要关注以下文章
对于HDInsight 3.3和HDInsight 3.4
您将在笔记本的下面的单元格中添加
You will add below cell in your notebook
%%configure
{ "packages":["com.databricks:spark-avro_2.10:0.1"] }
对于HDInsight 3.5
您将在笔记本的下面的单元格中添加
You will add below cell in your notebook
%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.10:0.1" }}
对于HDInsight 3.6
您将在笔记本的下面的单元格中添加
You will add below cell in your notebook
%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}
这篇关于如何在HDInsight Spark/Jupyter上使用Avro?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!