问题描述
我正在使用saveAsTextFile()将Spark作业的结果存储在dbfs:/ FileStore / my_result文件夹中。
I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result.
我可以访问不同的部分 -xxxxx文件,但我想自动化将所有文件下载到本地计算机的过程。
I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all files to my local machine.
我尝试使用cURL,但是找不到RestAPI命令来下载dbfs:/ FileStore文件。
I have tried to use cURL, but I can't find the RestAPI command to download a dbfs:/FileStore file.
问题:如何将dbfs:/ FileStore文件下载到本地计算机上?
我正在使用Databricks社区版在大学的大数据分析中教授本科课程。我在本地计算机上安装了Windows 7。我已检查cURL和_netrc文件是否已正确安装和配置,因为我设法成功运行了RestAPI提供的某些命令。
I am using Databricks Community Edition to teach an undergraduate module in Big Data Analytics in college. I have Windows 7 installed in my local machine. I have checked that cURL and the _netrc files are properly installed and configured as I manage to successfully run some of the commands provided by the RestAPI.
在此先非常感谢您您的帮助!
最好的问候,
Nacho
Thank you very much in advance for your help!Best regards,Nacho
推荐答案
有一些选项可以将FileStore文件下载到本地计算机。
There are a few options for downloading FileStore files to your local machine.
更简单的选项:
- 安装,使用您的Databricks凭据进行配置,并使用CLI的
dbfs cp
命令。例如:dbfs cp dbfs:/FileStore/test.txt ./test.txt
。如果要下载文件的整个文件夹,则可以使用dbfs cp -r
。 - 从登录到Databricks的浏览器中,导航至
https://< YOUR_DATABRICKS_INSTANCE_NAME> .cloud.databricks.com / files /
。如果您使用的是Databricks Community Edition,则可能需要使用略有不同的路径。此下载方法在中进行了详细说明。 >
- Install the Databricks CLI, configure it with your Databricks credentials, and use the CLI's
dbfs cp
command. For example:dbfs cp dbfs:/FileStore/test.txt ./test.txt
. If you want to download an entire folder of files, you can usedbfs cp -r
. - From a browser signed into Databricks, navigate to
https://<YOUR_DATABRICKS_INSTANCE_NAME>.cloud.databricks.com/files/
. If you are using Databricks Community Edition then you may need to use a slightly different path. This download method described in more detail in the FileStore docs.
高级选项:
- 使用。您可以使用 API调用。要下载大文件,您可能需要发出多个
read
调用来访问整个文件的块。
- Use the DBFS REST API. You can access file contents using the
read
API call. To download a large file, you may need to issue multipleread
calls to access chunks of the full file.
这篇关于Databricks:将dbfs:/ FileStore文件下载到我的本地计算机上吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!