本文介绍了我找不到Dataproc已运行的NodeInitializationAction的证据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为Dataproc指定一个NodeInitializationAction,如下所示:

I am specifying a NodeInitializationAction for Dataproc as follows:

ClusterConfig clusterConfig = new ClusterConfig();
clusterConfig.setGceClusterConfig(...);
clusterConfig.setMasterConfig(...);
clusterConfig.setWorkerConfig(...);
List<NodeInitializationAction> initActions = new ArrayList<>();
NodeInitializationAction action = new NodeInitializationAction();
action.setExecutableFile("gs://mybucket/myExecutableFile");
initActions.add(action);
clusterConfig.setInitializationActions(initActions);

然后再来

Cluster cluster = new Cluster();
cluster.setProjectId("wide-isotope-147019");
cluster.setConfig(clusterConfig);
cluster.setClusterName("cat");

然后,最后,我在集群中调用dataproc.create操作.我可以看到正在创建的群集,但是当我将其SSH到主计算机(us-central1-f中的"cat-m")时,看不到我指定的脚本已被复制或运行的证据.

Then finally, I invoke the dataproc.create operation with the cluster. I can see the cluster being created, but when I ssh into the master machine ("cat-m" in us-central1-f), I see no evidence of the script I specified having been copied over or run.

所以这引出了我的问题:

So this leads to my questions:

  1. 在证据方面我应该期待什么? (我在/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0中找到了脚本本身.)
  2. 从哪里调用脚本?我知道它以用户root身份运行,但是除此之外,我不确定在哪里可以找到它.我没有在根目录中找到它.
  3. 从Create调用返回的操作在什么时候从创建"变为正在运行"?是在脚本调用之前还是之后发生这种情况?脚本的退出代码是否为非零值是否重要?

谢谢.

推荐答案

Dataproc保证了有关init动作的许多保证:

Dataproc makes a number of guarantees about init actions:

  • 每个脚本应下载并本地存储在以下位置:/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0

  • each script should be downloaded and stored locally in:/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0

脚本的输出将在登台存储桶"(通过--bucket选项指定的存储桶,或Dataproc自动生成的存储桶)中捕获.假设您的集群名为my-cluster,如果您通过gcloud compute instances describe my-cluster-m描述主实例,则确切位置在元数据键中

the output of the script will be captured in a "staging bucket" (either the bucket specified via --bucket option, or a Dataproc auto-generated bucket). Assuming your cluster is named my-cluster, if you describe master instance via gcloud compute instances describe my-cluster-m, the exact location is in dataproc-agent-output-directory metadata key

在所有init动作在所有节点上执行之前,Cluster可能不会进入RUNNING状态(并且操作可能无法完成).如果初始化动作以非零代码退出,或者初始化动作超过了指定的超时时间,则将被报告为

Cluster may not enter RUNNING state (and Operation may not complete) until all init actions execute on all nodes. If init action exits with non-zero code, or init action exceeds specified timeout, it will be reported as such

类似地,如果您调整群集的大小,我们保证在单独隔离每个工作线程之前,新工作线程不会加入群集

similarly if you resize a cluster, we guarantee that new workers do not join cluster until each worker is fully configured in isolation

如果您仍然不相信我:)检查/var/log/google-dataproc-agent-0.log中的Dataproc代理日志,并从BootstrapActionRunner中查找条目

if you still don't belive me :) inspect Dataproc agent log in /var/log/google-dataproc-agent-0.log and look for entries from BootstrapActionRunner

这篇关于我找不到Dataproc已运行的NodeInitializationAction的证据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 01:23