问题描述
我正在为Dataproc指定一个NodeInitializationAction,如下所示:
I am specifying a NodeInitializationAction for Dataproc as follows:
ClusterConfig clusterConfig = new ClusterConfig();
clusterConfig.setGceClusterConfig(...);
clusterConfig.setMasterConfig(...);
clusterConfig.setWorkerConfig(...);
List<NodeInitializationAction> initActions = new ArrayList<>();
NodeInitializationAction action = new NodeInitializationAction();
action.setExecutableFile("gs://mybucket/myExecutableFile");
initActions.add(action);
clusterConfig.setInitializationActions(initActions);
然后再来
Cluster cluster = new Cluster();
cluster.setProjectId("wide-isotope-147019");
cluster.setConfig(clusterConfig);
cluster.setClusterName("cat");
然后,最后,我在集群中调用dataproc.create操作.我可以看到正在创建的群集,但是当我将其SSH到主计算机(us-central1-f中的"cat-m")时,看不到我指定的脚本已被复制或运行的证据.
Then finally, I invoke the dataproc.create operation with the cluster. I can see the cluster being created, but when I ssh into the master machine ("cat-m" in us-central1-f), I see no evidence of the script I specified having been copied over or run.
所以这引出了我的问题:
So this leads to my questions:
- 在证据方面我应该期待什么? (我在/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0中找到了脚本本身.)
- 从哪里调用脚本?我知道它以用户root身份运行,但是除此之外,我不确定在哪里可以找到它.我没有在根目录中找到它.
- 从Create调用返回的操作在什么时候从创建"变为正在运行"?是在脚本调用之前还是之后发生这种情况?脚本的退出代码是否为非零值是否重要?
谢谢.
推荐答案
Dataproc保证了有关init动作的许多保证:
Dataproc makes a number of guarantees about init actions:
-
每个脚本应下载并本地存储在以下位置:
/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0
each script should be downloaded and stored locally in:
/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0
脚本的输出将在登台存储桶"(通过--bucket
选项指定的存储桶,或Dataproc自动生成的存储桶)中捕获.假设您的集群名为my-cluster
,如果您通过gcloud compute instances describe my-cluster-m
描述主实例,则确切位置在元数据键中
the output of the script will be captured in a "staging bucket" (either the bucket specified via --bucket
option, or a Dataproc auto-generated bucket). Assuming your cluster is named my-cluster
, if you describe master instance via gcloud compute instances describe my-cluster-m
, the exact location is in dataproc-agent-output-directory
metadata key
在所有init动作在所有节点上执行之前,Cluster可能不会进入RUNNING状态(并且操作可能无法完成).如果初始化动作以非零代码退出,或者初始化动作超过了指定的超时时间,则将被报告为
Cluster may not enter RUNNING state (and Operation may not complete) until all init actions execute on all nodes. If init action exits with non-zero code, or init action exceeds specified timeout, it will be reported as such
类似地,如果您调整群集的大小,我们保证在单独隔离每个工作线程之前,新工作线程不会加入群集
similarly if you resize a cluster, we guarantee that new workers do not join cluster until each worker is fully configured in isolation
如果您仍然不相信我:)检查/var/log/google-dataproc-agent-0.log
中的Dataproc代理日志,并从BootstrapActionRunner中查找条目
if you still don't belive me :) inspect Dataproc agent log in /var/log/google-dataproc-agent-0.log
and look for entries from BootstrapActionRunner
这篇关于我找不到Dataproc已运行的NodeInitializationAction的证据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!