本文介绍了无法使用PySpark和Databricks Connect连接到Azure Data Lake Gen2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,Databricks启动了 Databricks Connect

Recently, Databricks launched Databricks Connect that

除了我尝试访问Azure Data Lake Storage Gen2中的文件时,它工作正常.当我执行此操作时:

It works fine except when I try to access files in Azure Data Lake Storage Gen2. When I execute this:

spark.read.json("abfss://...").count()

我收到此错误:

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem not found   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)

有人知道如何解决此问题吗?

Does anybody know how to fix this?

更多信息:

  • databricks-connect version: 5.3.1

推荐答案

如果您安装存储而不是使用服务主体,则应该可以使用: https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html

If you mount the storage rather use a service principal you should find this works: https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html

我在这里发布了一些有关数据块连接限制的说明. https://datathirst.net/blog/2019/3/7/databricks-connect-limitations

I posted some instructions around the limitations of databricks connect here. https://datathirst.net/blog/2019/3/7/databricks-connect-limitations

这篇关于无法使用PySpark和Databricks Connect连接到Azure Data Lake Gen2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 06:46