问题描述
最近,Databricks启动了 Databricks Connect
Recently, Databricks launched Databricks Connect
that
除了我尝试访问Azure Data Lake Storage Gen2中的文件时,它工作正常.当我执行此操作时:
It works fine except when I try to access files in Azure Data Lake Storage Gen2. When I execute this:
spark.read.json("abfss://...").count()
我收到此错误:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
有人知道如何解决此问题吗?
Does anybody know how to fix this?
更多信息:
- databricks-connect 版本:5.3.1
- databricks-connect version: 5.3.1
推荐答案
如果您安装存储而不是使用服务主体,则应该可以使用: https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html
If you mount the storage rather use a service principal you should find this works: https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html
我在这里发布了一些有关数据块连接限制的说明. https://datathirst.net/blog/2019/3/7/databricks-connect-limitations
I posted some instructions around the limitations of databricks connect here. https://datathirst.net/blog/2019/3/7/databricks-connect-limitations
这篇关于无法使用PySpark和Databricks Connect连接到Azure Data Lake Gen2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!