本文介绍了无法在 databricks 社区版集群中 cat dbfs 文件.FileNotFoundError: [Errno 2] 没有这样的文件或目录:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试读取 databricks 社区版 集群中的增量日志文件.(databricks-7.2 版本)

Trying to read delta log file in databricks community edition cluster. (databricks-7.2 version)

df=spark.range(100).toDF("id")
df.show()
df.repartition(1).write.mode("append").format("delta").save("/user/delta_test")


with open('/user/delta_test/_delta_log/00000000000000000000.json','r')  as f:
  for l in f:
    print(l)

获取文件未找到错误:

FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<command-1759925981994211> in <module>
----> 1 with open('/user/delta_test/_delta_log/00000000000000000000.json','r')  as f:
      2   for l in f:
      3     print(l)

FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'

我尝试添加 /dbfs/,dbfs:/ 没有任何结果,仍然出现相同的错误.

I have tried with adding /dbfs/,dbfs:/ nothing got worked out,Still getting same error.

with open('/dbfs/user/delta_test/_delta_log/00000000000000000000.json','r')  as f:
  for l in f:
    print(l)

但是使用 dbutils.fs.head 我能够读取文件.

But using dbutils.fs.head i was able to read the file.

dbutils.fs.head("/user/delta_test/_delta_log/00000000000000000000.json")

'{"commitInfo":{"timestamp":1598224183331,"userId":"284520831744638","userName":"","operation":"WRITE","operationParameters":{"mode":"Append","partitionBy":"[]"},"notebook":{"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputBytes":"1171","numOutputRows":"100"}}}
{"protocol":{"minReaderVersi...etc

我们如何使用python打开方法读取/抓取databricks中的dbfs文件?

How can we read/cat a dbfs file in databricks with python open method?

推荐答案

默认情况下,这些数据在 DBFS 上,您的代码需要了解如何访问它.Python 不知道 - 这就是它失败的原因.

By default, this data is on the DBFS, and your code need to understand how to access it. Python doesn't know about it - that's why it's failing.

但是有一个解决方法 - DBFS 挂载到 /dbfs 的节点,所以你只需要将它附加到你的文件名:而不是 /user/delta_test/_delta_log/00000000000000000000.json,使用/dbfs/user/delta_test/_delta_log/00000000000000000000.json

But there is a workaround - DBFS is mounted to the nodes at /dbfs, so you just need to append it to your file name: instead of /user/delta_test/_delta_log/00000000000000000000.json, use /dbfs/user/delta_test/_delta_log/00000000000000000000.json

更新:在社区版上,在 DBR 7+ 中,此挂载被禁用.解决方法是使用 dbutils.fs.cp 命令将文件从 DBFS 复制到本地目录,例如 /tmp/var/tmp,然后从中读取:

update: on community edition, in DBR 7+, this mount is disabled. The workaround would be to use dbutils.fs.cp command to copy file from DBFS to local directory, like, /tmp, or /var/tmp, and then read from it:

dbutils.fs.cp("/file_on_dbfs", "file:///tmp/local_file")

请注意,如果您不指定 URI 模式,则该文件默认引用 DBFS,而要引用本地文件,您需要使用 file:// 前缀(请参阅 docs).

please note that if you don't specify URI schema, then the file by default is referring DBFS, and to refer the local file you need to use file:// prefix (see docs).

这篇关于无法在 databricks 社区版集群中 cat dbfs 文件.FileNotFoundError: [Errno 2] 没有这样的文件或目录:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-21 22:29