问题描述
我正在 Google Cloud Platform (GCP) 中运行我的 Google 数据流作业.当我在本地运行此作业时,它运行良好,但是在 GCP 上运行时,出现此错误java.lang.IllegalArgumentException:未找到方案 gs 的文件系统".我可以访问那个谷歌云 URI,我可以将我的 jar 文件上传到那个 URI,我可以看到我的本地工作的一些临时文件.
我在 GCP 中的作业 ID:
2019-08-08_21_47_27-162804342585245230(光束版本:2.12.0)
2019-08-09_16_41_15-11728697820819900062(梁版本:2.14.0)
我试过2.12.0和2.14.0的beam版本,都出现同样的错误.
java.lang.IllegalArgumentException:找不到方案 gs 的文件系统在 org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)在 org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)在 org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:689)在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:284)在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:206)在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:190)在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:169)在 org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)在 org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412)在 org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381)在 org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)在 org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)在 org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)在 org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)在 java.util.concurrent.FutureTask.run(FutureTask.java:266)在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)在 java.lang.Thread.run(Thread.java:745)如果您构建一个捆绑所有依赖项的fat jar",这可能是由几个问题引起的.
- 您必须包含依赖项
org.apache.beam:google-cloud-platform-core
以拥有 Beam GCS 文件系统. - 在远 jar 中,您必须保留
META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
文件,其中包含一行org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar
.您可以在步骤 1 的 jar 中找到此文件.您的依赖项中可能有许多同名文件,注册了不同的 Beam 文件系统.您需要配置 maven 或 gradle 以将它们组合为构建的一部分,否则它们会相互覆盖而无法正常工作.
I am running my google dataflow job in Google Cloud Platform(GCP).When I run this job locally it worked well, but when running it on GCP, I got this error"java.lang.IllegalArgumentException: No filesystem found for scheme gs".I have access to that google cloud URI, I can upload my jar file to that URI and I can see some temporary file for my local job.
My Job id in GCP:
2019-08-08_21_47_27-162804342585245230 (beam version:2.12.0)
2019-08-09_16_41_15-11728697820819900062 (beam version:2.14.0)
I have tried beam version of 2.12.0 and 2.14.0, both of them have the same error.
java.lang.IllegalArgumentException: No filesystem found for scheme gs
at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:689)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:284)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:206)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:190)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:169)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This may be caused by a couple of issues if you build a "fat jar" that bundles all of your dependencies.
- You must include the dependency
org.apache.beam:google-cloud-platform-core
to have the Beam GCS filesystem. - Inside your far jar, you must preserve the
META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
file with a lineorg.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar
. You can find this file in the jar from step 1. You will probably have many files with the same name in your dependencies, registering different Beam filesystems. You need to configure maven or gradle to combine these as part of your build or they will overwrite each other and not work properly.
这篇关于“未找到方案 gs 的文件系统"在谷歌云平台中运行数据流时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!