问题描述
我正在尝试在Dataflow/Beam上使用Python SDK来获取PCollection
中的项目示例.
I'm trying to get a sample of the items in PCollection
using the Python SDK on Dataflow / Beam.
虽然没有记载,但存在Sample.FixedSizeGlobally(n)
.
While it's not documented, Sample.FixedSizeGlobally(n)
exists.
测试时,似乎返回带有单个项目的PCollection
:包含样本的列表,而不是包含样本的PCollection
.那是对的吗?
When testing, it seems to return a PCollection
with a single item: a list containing the samples, rather than a PCollection
with the samples. Is that correct?
这是将单个项目PCollection
转换为项目的PCollection
的最佳方法吗?
Is doing this the best way of turning that single-item PCollection
into a PCollection
of the items?
| Sample.FixedSizeGlobally(sample_size)
| beam.FlatMap(lambda x: x)
推荐答案
当前是. Sample.FixedSizeGlobally()
转换返回带有单个列表元素的PCollection.您可以将它变成单个元素的PCollection,如您所说:
Currently, yes. The Sample.FixedSizeGlobally()
transform returns a PCollection with a single list element. You can turn it into a PCollection of single elements like you said:
Sample.FixedSizeGlobally(sample_size)
| beam.FlatMap(lambda x: x)
我们将确保添加PC-PC转换-我们也欢迎您对Beam做出贡献:)-但是与此同时,这就是我们所拥有的.
We'll make sure to add a PC-PC transform - and we also welcome your contributions to Beam : ) - But in the meantime, that's what we've got.
这篇关于使用Python在Dataflow/Beam中进行采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!