如果我从Lucene Java Doc Page正确理解,则将CustomScoreQuery实例设置为strict应该在方法FunctionQuery中将FieldSourcevalSrcScore值传递给CustomScoreProviderpublic float customScore(int doc, float subQueryScore, float valSrcScore)而不进行修改(如规范化),如FloatSourceField
因此,我认为我可以准确地获得浮点值,该值存储在文档的valSrcScore中。

但是,当索引数据量变大时,情况似乎并非如此。在这里,我有一个简单的例子来说明我的意思:

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queries.*;
import org.apache.lucene.queries.function.FunctionQuery;
import org.apache.lucene.queries.function.valuesource.FloatFieldSource;
import org.apache.lucene.search.*;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
import java.io.IOException;
public class CustomScoreTest {
    public static void main(String[] args) throws IOException {
        RAMDirectory index = new RAMDirectory();
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, new StandardAnalyzer());
        IndexWriter writer = new IndexWriter(index, config);

        // prepare dummy text
        String text = "";
        for (int i = 0; i < 1000; i++) text += "abc ";

        // add dummy docs
        for (int i = 0; i <25000; i++) {
            Document doc = new Document();
            doc.add(new FloatField("number", i * 100f, Field.Store.YES));
            doc.add(new TextField("text", text, Field.Store.YES));
            writer.addDocument(doc);
        }
        writer.close();

        IndexReader reader = IndexReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);

        Query q1 = new TermQuery(new Term("text", "abc"));
        CustomScoreQuery q2 = new CustomScoreQuery(q1, new FunctionQuery(new FloatFieldSource("number"))) {
            protected CustomScoreProvider getCustomScoreProvider(AtomicReaderContext ctx) throws IOException {
                return new CustomScoreProvider(ctx) {
                    public float customScore(int doc, float subQueryScore, float valSrcScore) throws IOException {
                        float diff = Math.abs(valSrcScore - searcher.doc(doc).getField("number").numericValue().floatValue());
                        if (diff > 0) throw new IllegalStateException("diff: " + diff);
                        return super.customScore(doc, subQueryScore, valSrcScore);
                    }
                };
            }
        };

        // In strict custom scoring, the part does not participate in weight normalization.
        // This may be useful when one wants full control over how scores are modified, and
        // does not care about normalising by the  part
        q2.setStrict(true);

        // Exception in thread "main" java.lang.IllegalStateException: diff: 1490700.0
        searcher.search(q2, 10);
    }
}


如该示例中所述,抛出异常是因为与存储在文档“数字”字段中的实际值相差很大。

但是,当我将索引虚拟文档的数量减少到2500个时,它可以按预期工作,并且我得到的值与“ number”字段中的值的差为0。

我在这里做错了什么?

最佳答案

您正在运行哪个版本的lucene?一种可能是随着索引大小的增长,AtomicReaderContext应该替换为LeafReaderContext。只是一个假设

关于java - Lucene CustomScoreQuery不会传递来自FunctionQuery的FieldSource的值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28263382/

10-14 19:22