问题描述
我有 2 个 PCollection
:
PCollection>列表A =pipeline.apply(getListA", ParDo.of(new getListA()))PCollection<List<String>>列表B =pipeline.apply(getListB", ParDo.of(new getListB()))
ListA
包含:
[1"、2"、3"]
ListB
包含:
[A"、B"、C"]
我如何得到一个包含以下内容的 PCollection
:
我的搜索指向:
如何做笛卡尔积Dataflow 中有两个 PCollection?
但这是使用带有 2 个输出的 coGroupby 处理 KV.coGroupby 可能可用于创建 2 个列表的笛卡尔积,但我没有看到.
看起来你在每个 PCollection 中都有一个元素,所以你只需要加入这些元素,然后你就可以在 DoFn 中自己做笛卡尔积
类似的东西
Flatten.pcollections(ListA, List).apply(WithKeys.of(null)).apply(GroupByKey.create())
之后,您将拥有一个包含单个元素的 PCollection,即一个 KV(null, Iterable(ListA, ListB)),您可以使用一些 for 循环生成笛卡尔积.
I have 2 PCollection
s:
PCollection<List<String>> ListA =
pipeline.apply("getListA", ParDo.of(new getListA()))
PCollection<List<String>> ListB =
pipeline.apply("getListB", ParDo.of(new getListB()))
ListA
contains:
["1","2","3"]
ListB
contains:
["A","B","C"]
How do I end up with a PCollection
that contains:
[
["A","1"],["A","2"],["A","3"],
["B","1"],["B","2"],["B","3"],
["C","1"],["C","2"],["C","3"],
]
My search has pointed me to:
How to do a cartesian product of two PCollections in Dataflow?
But this is dealing with KV using coGroupby with 2 outputs.It's possible that coGroupby can be used to create the cartesian product of 2 lists but I am not seeing it.
It looks like you have a single element in each PCollection, so you just need to join those elements, and then you can do the cartesian product yourself in a DoFn
Something like
Flatten.pcollections(ListA, List)
.apply(WithKeys.of(null))
.apply(GroupByKey.create())
After that, you'll have a PCollection with a single element, which is a KV(null, Iterable(ListA, ListB)), and you can generate the cartesian product with some for loops.
这篇关于使用 Apache Beam 查找 2 个列表的笛卡尔积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!