问题描述
我的PCollection看起来像这样:
I have a PCollection that looks like this:
PCollection<KV<KV<String, EventSession>, Long>> windowed_counts
我的目标是将其作为文本文件写出来。我想用
之类的东西:
My goal is to write this out as a text file. I thought to usesomething like:
windowed_counts.apply( TextIO.Write.to( "output" ));
但我很难正确设置Coders。这是我认为可行的:
but am having a hard time getting the Coders setup correctly. This is what I thought would work:
KvCoder kvcoder = KvCoder.of(KvCoder.of(StringUtf8Coder.of(), AvroDeterministicCoder.of(EventSession.class) ), TextualLongCoder.of());
TextIO.Write.Bound io = TextIO.Write.withCoder( kvcoder );
windowed_counts.apply( io.to( "output" ));
其中TextualLongCoder是我自己的AtomicCoder子类,类似于TextualIntegerCoder。 EventSession类注释为使用AvroDeterministicCoder作为其默认编码器。
where TextualLongCoder is my own subclass of AtomicCoder, analogous to TextualIntegerCoder. The EventSession class is annotated to use AvroDeterministicCoder as it's default coder.
但有了这个,我得到包含非文本字符等的乱码输出。有人可以建议你如何将这个特定的PCollection写成文本吗?我确信这里有一些显而易见的东西......
But with this I get garbled output that includes non-textual character, etc. Can anybody advice on how you would write this particular PCollection out as text? I'm sure there's something obvious I'm missing here...
推荐答案
您是否尝试过创建转换后的转换? PCollection
KV< KV< String,EventSession> ;, Long>
到 PCollection
of String
s然后将其写入文本文件?
Did you try creating a transform that will convert a PCollection
of KV<KV<String, EventSession>, Long>
to a PCollection
of String
s and then writing it into a text file?
我发现它是最灵活的方式满足我的需求
I found it to be most flexible way for my needs
这篇关于在Google Cloud Dataflow中使用带有复杂PCollection类型的TextIO.Write的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!