问题描述
我在实体类中有两个字段:
I have two fields in an entity class:
- EstablishmentName
- contactType
contactType 具有PBX,GSM,TEL和FAX之类的值
contactType has values like PBX, GSM, TEL and FAX
我想要一种评分机制,以便首先获取最匹配的数据,然后是PBX,TEL,GSM和FAX.
I want a scoring mechanism as to get the most matching data first then PBX, TEL, GSM and FAX.
得分:
- 在 EstablishmentName 上首先获取最匹配的数据
- 在 contactType 上先获取PBX,然后再获取TEL等
- On establishmentName to get the most matching data first
- On contactType to get first PBX then TEL and so on
我的最终查询是:
但它不返回结果.
我的问题是,如何在不同的值基础上增加特定字段?
我们可以对两个不同的字段使用以下查询:
We can use the following query for two different fields:
Query query = qb.keyword()
.onField( field_one).boostedTo(2.0f)
.andField( field_two)
.matching( searchTerm)
.createQuery();
但是我需要在其值上增加一个字段,就像我的情况是 contactType .
But i need to boost a field on its values as in my case it is contactType.
我的数据集:
(企业名称:演唱会装饰,联系人类型:GSM),(机构名称:Elissa Concert,联系人类型:TEL),(店名:Yara Concert,contactType:FAX),(店名:E Concept,联系类型:TEL),(EstablishmentName:Infinity Concept,contactType:FAX),(名称:SD概念,contactType:PBX),(企业名称:Broadcom技术概念,contactType:GSM),(EstablishmentName:概念商人,contactType:PBX)
My dataset:
(establishmentName : Concert Decoration, contactType : GSM),(establishmentName : Elissa Concert, contactType : TEL),(establishmentName : Yara Concert, contactType : FAX),(establishmentName : E Concept, contactType : TEL),(establishmentName : Infinity Concept, contactType : FAX),(establishmentName : SD Concept, contactType : PBX),(establishmentName : Broadcom Technical Concept, contactType : GSM),(establishmentName : Concept Businessmen, contactType : PBX)
通过搜索term = concert(对EstablishmentName进行模糊查询),它应该向我返回以下列表:(店名:Elissa Concert,contactType:TEL)
By searching the term=concert(fuzzy query on establishmentName), it should return me the list as below:(establishmentName : Elissa Concert, contactType : TEL)
(企业名称:演唱会装饰,contactType:GSM)
(establishmentName : Concert Decoration, contactType : GSM)
(店名:Yara演唱会,contactType:传真)
(establishmentName : Yara Concert, contactType : FAX)
(店名:概念商人,contactType:PBX)
(establishmentName : Concept Businessmen, contactType : PBX)
(店名:SD概念,contactType:PBX)
(establishmentName : SD Concept, contactType : PBX)
(企业名称:E Concept,contactType:TEL)
(establishmentName : E Concept, contactType : TEL)
(企业名称:Broadcom技术概念,contactType:GSM)
(establishmentName : Broadcom Technical Concept, contactType : GSM)
(店名:Infinity概念,contactType:传真)
(establishmentName : Infinity Concept, contactType : FAX)
推荐答案
据我了解,您基本上需要两阶段排序:
From what I understand you basically want a two-phase sort:
- 将完全匹配项放在其他(模糊)匹配项之前.
- 按联系人类型排序.
第二种方法很简单,但是第一种方法需要一些工作.您实际上可以依靠评分来实现它.
The second sort is trivial, but the first one will require a bit of work.You can actually rely on scoring to implement it.
本质上,该想法是对多个查询进行分解,并为每个查询分配恒定的分数.
Essentially the idea would be to run a disjunction of multiple queries, and to assign a constant score to each query.
而不是这样做:
Query query = qb.keyword()
.fuzzy().withEditDistanceUpTo(1)
.boostedTo(2.5f)
.onField("establishmentName")
.matching(searchTerm)
.createQuery();
执行此操作:
Query query = qb.bool()
.should(qb.keyword()
.withConstantScore().boostedTo(100.0f) // Higher score, sort first
.onField("establishmentName")
.matching(searchTerm)
.createQuery())
.should(qb.keyword()
.fuzzy().withEditDistanceUpTo(1)
.withConstantScore().boostedTo(1.0f) // Lower score, sort last
.onField("establishmentName")
.matching(searchTerm)
.createQuery())
.createQuery();
匹配的文档将是相同的,但是现在查询将分配可预测的分数:仅模糊匹配为1.0,模糊匹配为101.0(模糊查询为1,精确查询为100)完全匹配.
The matched documents will be the same, but now the query will assign predictable scores: 1.0 for fuzzy-only matches, and 101.0 (1 from the fuzzy query and 100 from the exact query) for exact matches.
这样,您可以定义排序如下:
This way, you can define the sort as follows:
fullTextQuery.setSort(qb.sort()
.byScore()
.andByField("contactType")
.createSort());
这可能不是一个非常优雅或优化的解决方案,但我认为它会起作用.
This may not be a very elegant, or optimized solution, but I think it will work.
要自定义联系人类型的相对顺序,我建议采用另一种方法:使用自定义桥到索引号,而不是"PBX"/"TEL"/等,为每个联系人类型分配所需的序号.本质上是这样的:
To customize the relative order of contact types, I would suggest a different approach: use a custom bridge to index numbers instead of the "PBX"/"TEL"/etc., assigning to each contact type the ordinal you expect. Essentially something like that:
public class Establishment {
@Field(name = "contactType_sort", bridge = @FieldBridge(impl = ContactTypeOrdinalBridge.class))
private ContactType contactType;
}
public class ContactTypeOrdinalBridge implements MetadataProvidingFieldBridge {
@Override
public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
if ( value != null ) {
int ordinal = getOrdinal((ContactType) value);
luceneOptions.addNumericFieldToDocument(name, ordinal, document);
luceneOptions.addNumericDocValuesFieldToDocument(name, ordinal, document);
}
}
@Override
public void configureFieldMetadata(String name, FieldMetadataBuilder builder) {
builder.field(name, FieldType.INTEGER).sortable(true);
}
private int getOrdinal(ContactType value) {
switch( value ) {
case PBX: return 0;
case TEL: return 1;
case GSM: return 2;
case PBX: return 3;
default: return 4;
}
}
}
然后重新编制索引,并进行如下排序:
Then reindex, and sort like this:
fullTextQuery.setSort(qb.sort()
.byScore()
.andByField("contactType_sort")
.createSort());
这篇关于如何使用字段值增强休眠搜索查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!