java - Java 8中的嵌套收集器

我正在处理人口统计数据。我有一个关于一个州不同县的记录集合(每个县几个记录)，我想按县进行汇总。

我实现了以下消费者:

public class CountyPopulation implements java.util.function.Consumer<Population>
{
    private String countyId ;
    private List<Demographic> demographics ;

    public CountyPopulation()
    {
        demographics = new ArrayList<Demographic>() ;
    }

    public List<Demographic> getDemographics()
    {
        return demographics ;
    }

    public void accept(Population pop)
    {
        if ( countyId == null )
        {
            countyId = pop.getCtyId() ;
        }
        demographics.add( pop.getDemographic() ) ;
    }

    public void combine(CountyPopulation other)
    {
        demographics.addAll( other.getDemographics() ) ;
    }
}

该CountyPopulation用于使用以下代码(其中“089”是县标识符)来汇总有关特定县的数据:

CountyPopulation ctyPop = populations
    .stream()
    .filter( e -> "089".equals( e.getCtyId() ) )
    .collect(CountyPopulation::new,
             CountyPopulation::accept,
             CountyPopulation::combine) ;

现在，我想删除“过滤器”并在使用聚合器之前按县对记录进行分组。

根据您的第一个答案，我知道可以使用静态函数Collector.of通过以下方式完成此操作:

Map<String,CountyPopulation> pop = populations
    .stream()
    .collect(
        Collectors.groupingBy(Population::getCtyId,
                              Collector.of( CountyPopulation::new,
                                            CountyPopulation::accept,
                                            (a,b)->{a.combine(b); return a;} ))) ;

但是，此代码不起作用，因为Collector.of()的签名与collect()不同。
我怀疑该解决方案涉及修改CountyPopulation类，以便它实现java.util.function.BiConsumer而不是java.util.function.Consumer，但是我这样做的尝试没有用，我也不知道为什么。

最佳答案

在collect上调用 Stream with the three arguments等效于使用 Collector.of 。

因此，您可以使用以下方法实现目标:

Map<String,CountyPopulation> pop = populations.stream().collect(
  Collectors.groupingBy(Population::getCtyId, Collector.of(
    CountyPopulation::new, CountyPopulation::accept, CountyPopulation::combine))) ;

为了获得更好的并行性能，值得研究您可以提供的可选 Characteristics 。如果UNORDERED或CONCURRENT中的一个或两个都与CountyPopulation类的行为匹配，则可以提供它们(在您的情况下暗指IDENTITY_FINISH)。

并且使用 groupingByConcurrent 代替groupingBy也可以提高并行性能。