java - 对来自Lucene索引的结果进行分类

我有一个Lucene索引，它是通过Hibernate在Hibernate Search注释的帮助下生成的，具有3个字段（为了简化一点），它们描述了一篇文章：

id, title, brand

内容示例：

id, title, brand 1, "Long skirt", "Sweet and Gabbana" 2, "Sweet neck vest", "Armani" 3, "Sweet feeling shirt", "Armani"

注意“ Sweet and Gabbana”，“ Sweet脖子背心”和“ Sweet感觉衬衫”如何共享“ sweet”一词。

我想进行一个Lucene查询，这样，如果我搜索关键字“ sweet”，我会得到2个不同的类别，一个类别用于标题，另一个类别用于品牌。例如：

职称->“甜蜜的脖子背心”，“甜蜜的感觉衬衫”
品牌->“ Sweet and Gabbana”

换句话说，我想向用户表明系统在这两个不同类别中找到了结果。

当我运行查询时（标题和品牌之间的一种OR），我得到了全部三个条目（在Lucene中，ID为1、2和3的文档）仅包含一个属性或另一个属性，但是我该怎么办？对它们进行分类？

@PersistenceContext
private EntityManager em;

...

@Override
public List<ArticleByIndexModel> retrieveArticlesSearchQueryResult(final String searchString,
        final String languageIso639) {

    final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
    final org.apache.lucene.search.Query luceneQuery = buildUpArticlesSearchLuceneQuery(searchString,
            languageIso639, fullTextEntityManager);

    final String titleFieldName = ArticleTranslationFieldPrefixes.TITLE + languageIso639;
    final String brandNameFieldName = BrandTaxonomy.BrandTaxonomyNameFieldName.NAME;

    final FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery(luceneQuery);
    fullTextQuery.setMaxResults(50);
    fullTextQuery.setProjection(Article_.articleID.getName(), titleFieldName, brandNameFieldName,
            Article_.brandSku.getName(), FullTextQuery.DOCUMENT_ID, FullTextQuery.EXPLANATION, FullTextQuery.THIS);

    @SuppressWarnings("unchecked")
    final List<Object[]> list = (List<Object[]>) fullTextQuery.getResultList();

    final List<ArticleByIndexModel> resultList = list.stream()
            .map(x -> new ArticleByIndexModel((Integer) x[0], (String) x[1])).collect(Collectors.toList());
    return resultList;
}

private org.apache.lucene.search.Query buildUpArticlesSearchLuceneQuery(final String searchString,
        final String languageIso639, final FullTextEntityManager fullTextEntityManager) {

    final String brandSkuName = Article_.brandSku.getName();

    final String analyzerPartName = ArticleTranslationDiscriminator.getAnalyzerPartNameByLanguage(languageIso639);
    final String titleFieldName = ArticleTranslationFieldPrefixes.TITLE + languageIso639;
    final String titleEdgeNGramFieldName = ArticleTranslationFieldPrefixes.TITLE_EDGE_N_GRAM + languageIso639;
    final String titleNGramFieldName = ArticleTranslationFieldPrefixes.TITLE_N_GRAM + languageIso639;

    final String brandNameEdgeNGramFieldName = BrandTaxonomy.BrandTaxonomyNameFieldName.NAME_EDGE_N_GRAM;
    final String brandNameNGramFieldName = BrandTaxonomy.BrandTaxonomyNameFieldName.NAME_N_GRAM;

    final SearchFactory searchFactory = fullTextEntityManager.getSearchFactory();
    final QueryBuilder qb = searchFactory.buildQueryBuilder().forEntity(Article.class)
            .overridesForField(titleFieldName, ArticleTranslationFieldPrefixes.TITLE + analyzerPartName)
            .overridesForField(titleEdgeNGramFieldName,
                    ArticleTranslationFieldPrefixes.TITLE_EDGE_N_GRAM + analyzerPartName)
            .overridesForField(titleNGramFieldName, ArticleTranslationFieldPrefixes.TITLE_N_GRAM + analyzerPartName)
            .get();

    final org.apache.lucene.search.Query luceneQuery =
            /**/
            qb.bool()
                    /**/
                    .should(qb.phrase().withSlop(2).onField(titleNGramFieldName).andField(titleEdgeNGramFieldName)
                            .boostedTo(5).sentence(searchString.toLowerCase()).createQuery())
                    /**/
                    .should(qb.phrase().withSlop(2).onField(brandNameNGramFieldName)
                            .andField(brandNameEdgeNGramFieldName).boostedTo(5).sentence(searchString.toLowerCase())
                            .createQuery())
                    /**/
                    .should(qb.keyword().onField(brandSkuName).matching(searchString.toLowerCase()).createQuery())
                    /**/
                    .createQuery();

    return luceneQuery;
}

在进行2个不同的查询然后合并结果时，我看不到任何解决方案。

我阅读了有关方面的信息，但我认为它们不适用于这种情况。

你有什么想法？

谢谢！！！

最佳答案

我假设您需要将结果显示为单个列表，并为每个项目提供一些描述（由于标题而匹配/因品牌/两者而匹配）。

我认为没有任何功能可以让您在Hibernate Search中做到这一点。我想会有一些使用低级Lucene API（收集器）的方法，但这会涉及到一些黑魔法，而且我认为我们无法将其插入Hibernate Search。

因此，让我们走一条简单的道路：自己动手做。

我个人将只运行多个查询：

第一次，就像您在示例中所做的一样
第二次基于ID（.setProjection(ProjectionContants.ID)）进行投影，并且仅使用两个子句：一个强制匹配项与第一个查询的结果之一具有相同的ID（基本上是must(should(id=<firstID>), should(id=<secondID>), ... )，另一个强制搜索字符串与以下项匹配）标题（基本上是must(title=<searchString>)
第三次与第二次相似，但以品牌代替标题

然后，我将使用第二个查询和第三个查询的结果来确定给定结果是否由于标题或品牌而匹配。

当然，这仅在仅希望搜索字符串与标题或品牌（或两者）完全匹配的情况下才有效，而在搜索字符串的某些部分与标题匹配，而其他部分与品牌匹配的情况下，则不会。但是，如果这就是您想要的，那么您当前的查询还是错误的...