本文介绍了在集群范围内的初始化脚本中从dbfs复制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Azure Databricks群集上试用群集范围的初始化脚本.我正在努力查看可用的命令.

I want to try out cluster scoped init scripts on a Azure Databricks cluster. I'm struggling to see which commands are available.

基本上,我在dbfs上有一个文件,要在集群启动时将其复制到本地目录/tmp/config.

Basically, I've got a file on dbfs that I want to copy to a local directory /tmp/config when the cluster spins up.

所以我创建了一个非常简单的bash脚本:

So I created a very simple bash script:

#!/bin/bash
mkdir - p /tmp/config
databricks fs cp dbfs:/path/to/myFile.conf /tmp/config

启动群集失败,并显示群集已终止.原因:初始化脚本失败".查看dbfs上的日志,我看到了错误

Spinning up the cluster fails with "Cluster terminated. Reason: Init Script Failure". Looking at the log on dbfs, I see the error

bash: line 1: databricks: command not found

确定,因此databricks作为命令不可用.这是我在本地bash上用于从dbfs复制文件和向dbfs复制文件的命令.

OK, so databricks as a command is not available. That's the command I use on the local bash to copy files from and to dbfs.

还有哪些其他命令可用于从dbfs复制文件?更笼统:实际可以使用哪些命令?

What other commands are available to copy a file from dbfs?And more general: Which commands are actually available?

推荐答案

dbfs已安装到集群,因此您只需将其复制到shell脚本中即可:

The dbfs is mounted to the clusters, so you can just copy it in your shell script:

例如

cp /dbfs/your-folder/your-file.txt ./your-file-txt

如果在/dbfs位置上执行dir,则作为回报,您将获得dbfs中的所有文件夹/数据.

If you do a dir on the /dbfs location you get as a return all the folders/data you have in your dbfs.

您还可以先通过

%sh
cd /dbfs
dir

这篇关于在集群范围内的初始化脚本中从dbfs复制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-18 08:50