问题描述
我想在Azure Databricks群集上试用群集范围的初始化脚本.我正在努力查看可用的命令.
I want to try out cluster scoped init scripts on a Azure Databricks cluster. I'm struggling to see which commands are available.
基本上,我在dbfs上有一个文件,要在集群启动时将其复制到本地目录/tmp/config
.
Basically, I've got a file on dbfs that I want to copy to a local directory /tmp/config
when the cluster spins up.
所以我创建了一个非常简单的bash脚本:
So I created a very simple bash script:
#!/bin/bash
mkdir - p /tmp/config
databricks fs cp dbfs:/path/to/myFile.conf /tmp/config
启动群集失败,并显示群集已终止.原因:初始化脚本失败".查看dbfs上的日志,我看到了错误
Spinning up the cluster fails with "Cluster terminated. Reason: Init Script Failure". Looking at the log on dbfs, I see the error
bash: line 1: databricks: command not found
确定,因此databricks
作为命令不可用.这是我在本地bash上用于从dbfs复制文件和向dbfs复制文件的命令.
OK, so databricks
as a command is not available. That's the command I use on the local bash to copy files from and to dbfs.
还有哪些其他命令可用于从dbfs复制文件?更笼统:实际可以使用哪些命令?
What other commands are available to copy a file from dbfs?And more general: Which commands are actually available?
推荐答案
dbfs已安装到集群,因此您只需将其复制到shell脚本中即可:
The dbfs is mounted to the clusters, so you can just copy it in your shell script:
例如
cp /dbfs/your-folder/your-file.txt ./your-file-txt
如果在/dbfs位置上执行dir,则作为回报,您将获得dbfs中的所有文件夹/数据.
If you do a dir on the /dbfs location you get as a return all the folders/data you have in your dbfs.
您还可以先通过
%sh
cd /dbfs
dir
这篇关于在集群范围内的初始化脚本中从dbfs复制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!