本文介绍了使用 Apache Beam 查找 2 个列表的笛卡尔积的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 2 个 PCollection:

PCollection>列表A =pipeline.apply(getListA", ParDo.of(new getListA()))PCollection<List<String>>列表B =pipeline.apply(getListB", ParDo.of(new getListB()))

ListA 包含:

[1"、2"、3"]

ListB 包含:

[A"、B"、C"]

我如何得到一个包含以下内容的 PCollection:

[[A",1"],[A",2"],[A",3"],[B",1"],[B",2"],[B",3"],[C",1"],[C",2"],[C",3"],]

我的搜索指向:

如何做笛卡尔积Dataflow 中有两个 PCollection?

但这是使用带有 2 个输出的 coGroupby 处理 KV.coGroupby 可能可用于创建 2 个列表的笛卡尔积,但我没有看到.

解决方案

看起来你在每个 PCollection 中都有一个元素,所以你只需要加入这些元素,然后你就可以在 DoFn 中自己做笛卡尔积

类似的东西

Flatten.pcollections(ListA, List).apply(WithKeys.of(null)).apply(GroupByKey.create())

之后,您将拥有一个包含单个元素的 PCollection,即一个 KV(null, Iterable(ListA, ListB)),您可以使用一些 for 循环生成笛卡尔积.

I have 2 PCollections:

PCollection<List<String>> ListA =
        pipeline.apply("getListA", ParDo.of(new getListA()))
PCollection<List<String>> ListB =
        pipeline.apply("getListB", ParDo.of(new getListB()))

ListA contains:

["1","2","3"]

ListB contains:

["A","B","C"]

How do I end up with a PCollection that contains:

[
 ["A","1"],["A","2"],["A","3"],
 ["B","1"],["B","2"],["B","3"],
 ["C","1"],["C","2"],["C","3"],
]

My search has pointed me to:

How to do a cartesian product of two PCollections in Dataflow?

But this is dealing with KV using coGroupby with 2 outputs.It's possible that coGroupby can be used to create the cartesian product of 2 lists but I am not seeing it.

解决方案

It looks like you have a single element in each PCollection, so you just need to join those elements, and then you can do the cartesian product yourself in a DoFn

Something like

Flatten.pcollections(ListA, List)
.apply(WithKeys.of(null))
.apply(GroupByKey.create())

After that, you'll have a PCollection with a single element, which is a KV(null, Iterable(ListA, ListB)), and you can generate the cartesian product with some for loops.

这篇关于使用 Apache Beam 查找 2 个列表的笛卡尔积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 03:02