defaultFS属性时无法创建Dataproc集群吗

defaultFS属性时无法创建Dataproc集群吗

本文介绍了设置fs.defaultFS属性时无法创建Dataproc集群吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这已经是先前 post ,但是我不相信答案,因为 Google文档指定可以创建一个设置fs.defaultFS属性的集群.而且,即使可以通过编程方式设置此属性,有时,从命令行设置也更方便.

This was already the object of discussion in previous post, however, I'm not convinced with the answers as the Google docs specify that it is possible to create a cluster setting the fs.defaultFS property. Moreover, even if possible to set this property programmatically, sometimes, it's more convenient to set it from command line.

所以我想知道为什么以下选项在传递给集群创建命令时不起作用:--properties core:fs.defaultFS=gs://my-bucket?请注意,我在没有前一个标志的情况下运行该命令时并未包含所有参数,并且该命令成功创建了集群.但是,通过此操作时,我得到:失败:无法启动主服务器:数据节点报告数量不足."

So I wanted to know why the following option when passed to my cluster creation command does not work: --properties core:fs.defaultFS=gs://my-bucket? Please note I haven't included all parameters as I ran the command without the previous flag and it succeeded to create the cluster. However, when passing this, I get: "failed: Cannot start master: Insufficientnumber of DataNodes reporting."

如果有人通过设置fs.defaultFS成功创建了一个dataproc集群,那会很棒吗?谢谢.

If anyone managed to create a dataproc cluster by setting the fs.defaultFS that'd be great? Thanks.

推荐答案

的确,由于对实际HDFS的某些依赖性,仍然存在一些已知问题.这些文档并不是要暗示在集群创建时将fs.defaultFS设置为GCS路径是可行的,而只是提供一个方便的示例来显示core-site.xml中的属性;从理论上讲,例如,将fs.defaultFS设置为不同预先存在的HDFS群集将是可行的.我已提交票证以更改文档中的示例,以免造成混淆.

It's true there are still known issues due to certain dependencies on actual HDFS; the docs were not intended to imply that setting fs.defaultFS to a GCS path at cluster-creation time would work, but to simply provide a convenient example of a property that appears in core-site.xml; in theory it would work to set fs.defaultFS to a different preexisting HDFS cluster, for example. I've filed a ticket to change the example in the documentation to avoid confusion.

两个选项:

  1. 在提交作业时使用每个作业属性覆盖fs.defaultFS
  2. 通过使用初始化操作(而不是群集属性)显式设置fs.defaultFS来解决一些已知问题.
  1. Just override fs.defaultFS at job-submission time using per-job properties
  2. Workaround some of the known issues by setting fs.defaultFS explicitly using an initialization action instead of cluster properties.

更好地理解选项1是有效的,因为群集级别的HDFS依赖关系不会改变.选项2之所以起作用,是因为大多数不兼容性仅在初始启动期间发生,并且初始化操作在相关守护程序已经启动之后运行.要覆盖init操作中的设置,请使用bdconfig:

Option 1 is better understood to work because cluster-level HDFS dependencies won't change. Option 2 works because most of the incompatibilities occur during initial startup only, and initialization actions run after the relevant daemons start up already. To override the setting in an init action, you'd use bdconfig:

bdconfig set_property \
    --name 'fs.defaultFS' \
    --value 'gs://my-bucket' \
    --configuration_file /etc/hadoop/conf/core-site.xml \
    --clobber

这篇关于设置fs.defaultFS属性时无法创建Dataproc集群吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 04:41