如何使用apache beam框架按降序排序?
我设法创建了一个单词计数管道,该管道按单词的字母顺序对输出进行排序,但不知道如何反转排序顺序。
代码如下:
public class SortedWordCount {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
BufferedExternalSorter.Options options1 = BufferedExternalSorter.options();
p.apply(TextIO.read().from("d:/dev/playground/apache/beam/word-count-beam/src/test/resources/bible/whole_bible.txt"))
.apply("ExtractWords", ParDo.of(new DoFn<String, String>() {
@ProcessElement
public void processElement(ProcessContext c) {
for (String word : c.element().split(ExampleUtils.TOKENIZER_PATTERN)) {
if (!word.isEmpty()) {
c.output(word);
}
}
}
}))
.apply(Count.perElement())
.apply(ParDo.of(new DoFn<KV<String, Long>, KV<String, Long>>() {
@ProcessElement
public void processElement(ProcessContext c){
KV<String, Long> element = c.element();
if(element.getKey().length() > 2) {
c.output(element);
}
}
}))
.apply("CreateKey", MapElements.via(new SimpleFunction<KV<String, Long>, KV<String, KV<String, Long>>>() {
public KV<String, KV<String, Long>> apply(KV<String, Long> input) {
return KV.of("sort", KV.of(input.getKey().toLowerCase(), input.getValue()));
}
}))
.apply(GroupByKey.create())
.apply(SortValues.create(options1))
.apply("FormatResults", MapElements.via(new SimpleFunction<KV<String, Iterable<KV<String, Long>>>, String>() {
@Override
public String apply(KV<String, Iterable<KV<String, Long>>> input) {
return StreamSupport.stream(input.getValue().spliterator(), false)
.map(value -> String.format("%20s: %s", value.getKey(), value.getValue()))
.collect(Collectors.joining(String.format("%n")));
}
}))
.apply(TextIO.write().to("bible"));
// Run the pipeline.
p.run().waitUntilFinish();
}
}
此代码生成按字母顺序排列的单词列表及其各自的计数:
aaron: 350
aaronites: 2
abaddon: 1
abagtha: 1
abana: 1
abarim: 4
abase: 4
abased: 4
abasing: 1
abated: 6
abba: 3
abda: 2
abdeel: 1
abdi: 3
abdiel: 1
abdon: 8
abednego: 15
abel: 16
abelbethmaachah: 2
abelmaim: 1
编辑1:
经过一些调试,我知道代码使用了类:
org.apache.beam.sdk.extensions.sorter.InMemorySorter
此类在执行排序方法期间使用静态最终比较器:
private static final Comparator<byte[]> COMPARATOR = UnsignedBytes.lexicographicalComparator();
public Iterable<KV<byte[], byte[]>> sort() {
checkState(!sortCalled, "sort() can only be called once.");
sortCalled = true;
Comparator<KV<byte[], byte[]>> kvComparator =
new Comparator<KV<byte[], byte[]>>() {
@Override
public int compare(KV<byte[], byte[]> o1, KV<byte[], byte[]> o2) {
return COMPARATOR.compare(o1.getKey(), o2.getKey());
}
};
Collections.sort(records, kvComparator);
return Collections.unmodifiableList(records);
}
在这个类中没有办法注入比较器。
最佳答案
您可以将Iterable<KV<String, Long>>
提取到List<KV<String, Long>>
中,并使用Collections.reverse()
反转列表。