



这已经是先前 post ,但是我不相信答案,因为 Google文档指定可以创建一个设置fs.defaultFS属性的集群.而且,即使可以通过编程方式设置此属性,有时,从命令行设置也更方便.

This was already the object of discussion in previous post, however, I'm not convinced with the answers as the Google docs specify that it is possible to create a cluster setting the fs.defaultFS property. Moreover, even if possible to set this property programmatically, sometimes, it's more convenient to set it from command line.

所以我想知道为什么以下选项在传递给集群创建命令时不起作用:--properties core:fs.defaultFS=gs://my-bucket?请注意,我在没有前一个标志的情况下运行该命令时并未包含所有参数,并且该命令成功创建了集群.但是,通过此操作时,我得到:失败:无法启动主服务器:数据节点报告数量不足."

So I wanted to know why the following option when passed to my cluster creation command does not work: --properties core:fs.defaultFS=gs://my-bucket? Please note I haven't included all parameters as I ran the command without the previous flag and it succeeded to create the cluster. However, when passing this, I get: "failed: Cannot start master: Insufficientnumber of DataNodes reporting."


If anyone managed to create a dataproc cluster by setting the fs.defaultFS that'd be great? Thanks.



It's true there are still known issues due to certain dependencies on actual HDFS; the docs were not intended to imply that setting fs.defaultFS to a GCS path at cluster-creation time would work, but to simply provide a convenient example of a property that appears in core-site.xml; in theory it would work to set fs.defaultFS to a different preexisting HDFS cluster, for example. I've filed a ticket to change the example in the documentation to avoid confusion.


  1. 在提交作业时使用每个作业属性覆盖fs.defaultFS
  2. 通过使用初始化操作(而不是群集属性)显式设置fs.defaultFS来解决一些已知问题.
  1. Just override fs.defaultFS at job-submission time using per-job properties
  2. Workaround some of the known issues by setting fs.defaultFS explicitly using an initialization action instead of cluster properties.


Option 1 is better understood to work because cluster-level HDFS dependencies won't change. Option 2 works because most of the incompatibilities occur during initial startup only, and initialization actions run after the relevant daemons start up already. To override the setting in an init action, you'd use bdconfig:

bdconfig set_property \
    --name 'fs.defaultFS' \
    --value 'gs://my-bucket' \
    --configuration_file /etc/hadoop/conf/core-site.xml \


08-06 04:41