问题描述
我正在Google Cloud Platform(GCP)中运行我的google dataflow作业.当我在本地运行此作业时,效果很好,但是在GCP上运行时,出现此错误"java.lang.IllegalArgumentException:未找到方案gs的文件系统".我可以访问该Google Cloud URI,可以将jar文件上传到该URI,并且可以看到一些本地工作的临时文件.
I am running my google dataflow job in Google Cloud Platform(GCP).When I run this job locally it worked well, but when running it on GCP, I got this error"java.lang.IllegalArgumentException: No filesystem found for scheme gs".I have access to that google cloud URI, I can upload my jar file to that URI and I can see some temporary file for my local job.
我在GCP中的工作ID:
My Job id in GCP:
2019-08-08_21_47_27-162804342585245230(光束版本:2.12.0)
2019-08-08_21_47_27-162804342585245230 (beam version:2.12.0)
2019-08-09_16_41_15-11728697820819900062(光束版本:2.14.0)
2019-08-09_16_41_15-11728697820819900062 (beam version:2.14.0)
我尝试使用2.12.0和2.14.0的Beam版本,它们都具有相同的错误.
I have tried beam version of 2.12.0 and 2.14.0, both of them have the same error.
java.lang.IllegalArgumentException: No filesystem found for scheme gs
at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:689)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:284)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:206)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:190)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:169)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
推荐答案
如果您构建了一个捆绑所有依赖项的胖子",则可能是由于两个问题引起的.
This may be caused by a couple of issues if you build a "fat jar" that bundles all of your dependencies.
- 您必须包含依赖项
org.apache.beam:google-cloud-platform-core
来拥有Beam GCS文件系统. - 在您的远端jar中,您必须保存
META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
文件,并在其中添加org.apache.beam.sdk行.extensions.gcp.storage.GcsFileSystemRegistrar
.您可以从第1步的jar中找到该文件.您的依赖项中可能会有许多同名文件,注册了不同的Beam文件系统.您需要配置maven或gradle以便将它们结合在一起作为构建的一部分,否则它们将相互覆盖而无法正常工作.
- You must include the dependency
org.apache.beam:google-cloud-platform-core
to have the Beam GCS filesystem. - Inside your far jar, you must preserve the
META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
file with a lineorg.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar
. You can find this file in the jar from step 1. You will probably have many files with the same name in your dependencies, registering different Beam filesystems. You need to configure maven or gradle to combine these as part of your build or they will overwrite each other and not work properly.
这篇关于“找不到方案gs的文件系统"在Google Cloud Platform中运行数据流时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!