如何使用apache beam框架按降序排序?
我设法创建了一个单词计数管道,该管道按单词的字母顺序对输出进行排序,但不知道如何反转排序顺序。
代码如下:

public class SortedWordCount {

    public static void main(String[] args) {
        PipelineOptions options = PipelineOptionsFactory.create();
        Pipeline p = Pipeline.create(options);

        BufferedExternalSorter.Options options1 = BufferedExternalSorter.options();

        p.apply(TextIO.read().from("d:/dev/playground/apache/beam/word-count-beam/src/test/resources/bible/whole_bible.txt"))
                .apply("ExtractWords", ParDo.of(new DoFn<String, String>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) {
                        for (String word : c.element().split(ExampleUtils.TOKENIZER_PATTERN)) {
                            if (!word.isEmpty()) {
                                c.output(word);
                            }
                        }
                    }
                }))
                .apply(Count.perElement())
                .apply(ParDo.of(new DoFn<KV<String, Long>, KV<String, Long>>() {
                    @ProcessElement
                    public void processElement(ProcessContext c){
                        KV<String, Long> element = c.element();
                        if(element.getKey().length() > 2) {
                            c.output(element);
                        }
                    }
                }))
                .apply("CreateKey", MapElements.via(new SimpleFunction<KV<String, Long>, KV<String, KV<String, Long>>>() {
                    public KV<String, KV<String, Long>> apply(KV<String, Long> input) {
                        return KV.of("sort", KV.of(input.getKey().toLowerCase(), input.getValue()));
                    }
                }))
                .apply(GroupByKey.create())
                .apply(SortValues.create(options1))
                .apply("FormatResults", MapElements.via(new SimpleFunction<KV<String, Iterable<KV<String, Long>>>, String>() {
                    @Override
                    public String apply(KV<String, Iterable<KV<String, Long>>> input) {
                        return StreamSupport.stream(input.getValue().spliterator(), false)
                                .map(value -> String.format("%20s: %s", value.getKey(), value.getValue()))
                                .collect(Collectors.joining(String.format("%n")));
                    }
                }))
                .apply(TextIO.write().to("bible"));
        // Run the pipeline.
        p.run().waitUntilFinish();
    }
}

此代码生成按字母顺序排列的单词列表及其各自的计数:
           aaron: 350
       aaronites: 2
         abaddon: 1
         abagtha: 1
           abana: 1
          abarim: 4
           abase: 4
          abased: 4
         abasing: 1
          abated: 6
            abba: 3
            abda: 2
          abdeel: 1
            abdi: 3
          abdiel: 1
           abdon: 8
        abednego: 15
            abel: 16
 abelbethmaachah: 2
        abelmaim: 1

编辑1:
经过一些调试,我知道代码使用了类:
org.apache.beam.sdk.extensions.sorter.InMemorySorter

此类在执行排序方法期间使用静态最终比较器:
private static final Comparator<byte[]> COMPARATOR = UnsignedBytes.lexicographicalComparator();

public Iterable<KV<byte[], byte[]>> sort() {
  checkState(!sortCalled, "sort() can only be called once.");

  sortCalled = true;

  Comparator<KV<byte[], byte[]>> kvComparator =
    new Comparator<KV<byte[], byte[]>>() {

      @Override
      public int compare(KV<byte[], byte[]> o1, KV<byte[], byte[]> o2) {
        return COMPARATOR.compare(o1.getKey(), o2.getKey());
      }
    };
  Collections.sort(records, kvComparator);
  return Collections.unmodifiableList(records);
}

在这个类中没有办法注入比较器。

最佳答案

您可以将Iterable<KV<String, Long>>提取到List<KV<String, Long>>中,并使用Collections.reverse()反转列表。

07-24 21:17