本文介绍了使用Python在Dataflow/Beam中进行采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Dataflow/Beam上使用Python SDK来获取PCollection中的项目示例.

I'm trying to get a sample of the items in PCollection using the Python SDK on Dataflow / Beam.

虽然没有记载,但存在Sample.FixedSizeGlobally(n).

While it's not documented, Sample.FixedSizeGlobally(n) exists.

测试时,似乎返回带有单个项目的PCollection:包含样本的列表,而不是包含样本的PCollection.那是对的吗?

When testing, it seems to return a PCollection with a single item: a list containing the samples, rather than a PCollection with the samples. Is that correct?

这是将单个项目PCollection转换为项目的PCollection的最佳方法吗?

Is doing this the best way of turning that single-item PCollection into a PCollection of the items?

| Sample.FixedSizeGlobally(sample_size)
| beam.FlatMap(lambda x: x)

推荐答案

当前是. Sample.FixedSizeGlobally()转换返回带有单个列表元素的PCollection.您可以将它变成单个元素的PCollection,如您所说:

Currently, yes. The Sample.FixedSizeGlobally() transform returns a PCollection with a single list element. You can turn it into a PCollection of single elements like you said:

Sample.FixedSizeGlobally(sample_size)
| beam.FlatMap(lambda x: x)

我们将确保添加PC-PC转换-我们也欢迎您对Beam做出贡献:)-但是与此同时,这就是我们所拥有的.

We'll make sure to add a PC-PC transform - and we also welcome your contributions to Beam : ) - But in the meantime, that's what we've got.

这篇关于使用Python在Dataflow/Beam中进行采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 16:12