本文介绍了数据流模板作业未采用输入参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用以下命令创建的数据流模板

I have a dataflow template created with below command

    python scrap.py --setup_file /home/deepak_verma/setup.py
  --temp_location gs://visualization-dev/temp
 --staging_location gs://visualization-dev/stage
--project visualization-dev --job_name scrap-job
--subnetwork regions/us-east1/subnetworks/dataflow-internal
--region us-east1  --input sentiment_analysis.table_view
--output gs://visualization-dev/incoming
--runner DataflowRunner
--template_location gs://visualization-dev/template/scrap

我的数据流管道接受输入和输出参数作为这样的值提供者

My dataflow pipeline accepts the input and output parameters as value provider like this

@classmethod
def _add_argparse_args(cls, parser):
    parser.add_value_provider_argument(
        '--input', dest='input', required=True,
        help='Input view. sentiment_analysis.table_view',
    )

    parser.add_value_provider_argument(
        '--output', dest='output', required=True,
        help='output gcs file path'
    )

我用它作为

beam.io.Read(beam.io.BigQuerySource(query=read_query.format(
        table=options.input.get(), limit=(LIMIT and "limit " + str(LIMIT) or '')), use_standard_sql=True)))

where read_query is defined as `SELECT upc, max_review_date FROM `{table}`

现在当我用不同的输入参数调用这个模板时

Now when I call this template using this with different input parameter

template_body = {
                'jobName': job_name,
                'parameters': {'input': 'table_view2'}
            }
            credentials = GoogleCredentials.get_application_default()
            service = build('dataflow', 'v1b3', credentials=credentials)
            request = service.projects().locations().templates().launch(projectId=constants.GCP_PROJECT_ID, location=constants.REGION, gcsPath=template_gcs_path, body=template_body)

数据流不会为 table_view2 调用它,而是为该作业使用 table_view.

The dataflow does not calls this for table_view2 but instead it use the table_view for this job.

推荐答案

您需要的是能够将查询作为 ValueProvider 传递,而不是作为已经格式化的字符串.这在 Beam 中尚无法实现.

What you need is to be able to pass the query as a ValueProvider, and not as an already-formatted string. This is not yet possible in Beam.

这里有一个开放的功能请求:https://issues.apache.org/jira/browse/BEAM-1440

There's an open feature request here: https://issues.apache.org/jira/browse/BEAM-1440

这篇关于数据流模板作业未采用输入参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 15:04