问题描述
我有2个 PCollection
:
PCollection<List<String>> ListA =
pipeline.apply("getListA", ParDo.of(new getListA()))
PCollection<List<String>> ListB =
pipeline.apply("getListB", ParDo.of(new getListB()))
ListA
包含:
["1","2","3"]
ListB
包含:
["A","B","C"]
我如何最终得到一个包含以下内容的 PCollection
:
How do I end up with a PCollection
that contains:
[
["A","1"],["A","2"],["A","3"],
["B","1"],["B","2"],["B","3"],
["C","1"],["C","2"],["C","3"],
]
我的搜索使我指向:
但这是使用带有2个输出的coGroupby处理KV的.可以使用coGroupby来创建2个列表的笛卡尔积,但我没有看到它.
But this is dealing with KV using coGroupby with 2 outputs.It's possible that coGroupby can be used to create the cartesian product of 2 lists but I am not seeing it.
推荐答案
每个PCollection中似乎都有一个元素,因此您只需要加入这些元素,然后就可以在DoFn中自己做笛卡尔积
It looks like you have a single element in each PCollection, so you just need to join those elements, and then you can do the cartesian product yourself in a DoFn
类似
Flatten.pcollections(ListA, List)
.apply(WithKeys.of(null))
.apply(GroupByKey.create())
在那之后,您将拥有一个包含单个元素的PCollection,该元素是KV(null,Iterable(ListA,ListB)),并且可以使用一些for循环生成笛卡尔乘积.
After that, you'll have a PCollection with a single element, which is a KV(null, Iterable(ListA, ListB)), and you can generate the cartesian product with some for loops.
这篇关于使用Apache Beam查找2个列表的笛卡尔积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!