本文介绍了是否可以让SOLR MoreLikeThis使用不同的字段进行模型和匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个字段A和B的文档.

Let's say I have documents with two fields, A and B.

我想使用SOLR的MoreLikeThis,但要有所不同:我最感兴趣的是增强A字段类似于模型文档的B字段的文档. (也就是说,从模型B字段中提取MLT的有趣项",但仅基于A字段来收集MLT结果.)

I'd like to use SOLR's MoreLikeThis, but with a twist: I'm most interested in boosting documents whose A field is like my model document's B field. (That is, extract MLT's 'interesting terms' from the model B field, but only collect MLT results based on the A field.)

我看不到使用mlt.fl字段或mlt.qf增强功能来在单个查询中实现此效果的方法. (似乎mlt.fl指定了用于发现有趣的术语"和与这些术语匹配的字段.)我是否缺少某些选择?

I don't see a way to use the mlt.fl fields or mlt.qf boosts to achieve this effect in a single query. (It seems mlt.fl specifies fields used for both discovery of 'interesting terms' and matching to those terms.) Am I missing some option?

还是我必须自己提取有趣的条款"并交换"field:term"的详细信息?

Or will I have to extract the 'interesting terms' myself and swap the 'field:term' details?

(同样也赞赏其他想法.)

(Other ideas in this same vein appreciated as well.)

推荐答案

我现在认为有两种方法可以达到预期的效果(无需自定义MLT源代码).

I now think there are two ways to achieve the desired effect (without customizing the MLT source code).

第一种选择:使用 MLT处理程序进行初始MLT查询,并添加参数.这包括被认为很有趣的术语列表,并按其相对升序排列.通常的行为是针对相同的mlt.fl字段使用那些发现的术语来查找相似的文档.例如,响应中将包含以下内容:

First option: Do an initial MLT query with the MLT handler, adding the parameter &mlt.interestingTerms=details. This includes the list of terms that were deemed interesting, ranked with their relative boosts. The usual behavior uses those discovered terms against the same mlt.fl fields to find similar documents. For example, the response will include something like:

"interestingTerms": 
    ["field_b:foo",5.0,"field_b:bar",2.9085307,"field_b:baz",1.67070794]

(由于关于此初始查询的唯一有趣的事情是有趣的条款,因此将fq排除在外即可排除所有文档,这可以帮助它跳过不必要的计分工作.)

(Since the only thing about this initial query that's interesting is the interestingTerms, throwing in an fq that rules out all docs could help it skip unnecessary scoring work.)

将有趣的术语信息显式重组为新的OR查询field_a:foo^5.0 field_a:bar^2.9085307 field_a:baz^1.67070794等于使用B字段示例文本查找与A字段相似的文档,并且可能完全模仿了默认MLT在其字段上执行的查询类型通常的模型字段.

Explicitly re-composing that interestingTerms info into a new OR query field_a:foo^5.0 field_a:bar^2.9085307 field_a:baz^1.67070794 amounts to using the B field example text to find documents that are similar in field A, and may be mimicking exactly the kind of query default MLT does on its usual model field.

第二个选项:抓取模型文档的实际字段B文本,并将其直接作为 ContentStream正文输入(用于代替查询),用于指定模型文档.然后将mlt.fl定位在字段A,以收集相似的结果.例如,参数的片段可能是…&stream.body=foo bar baz&mlt.fl=field_a&….同样,最终效果是原始来自field_b的模型文本正在查找仅在field_a中相似的文档.

Second option: Grab the model document's actual field B text, and feed it directly as a ContentStream body, to be used in lieu of a query, for specifying the model document. Then target mlt.fl at field A for the sake of collecting similar results. For example, a fragment of the parameters might be …&stream.body=foo bar baz&mlt.fl=field_a&…. Again, the net effect being that model text originally from field_b is finding documents similar only in field_a.

这篇关于是否可以让SOLR MoreLikeThis使用不同的字段进行模型和匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 22:38