问题描述
我为Lucene.Net构建了一个自定义收集器,但是我不知道如何排序(或分页)结果.每次调用Collect时,我都可以将结果添加到内部PriorityQueue中,我知道这是执行此操作的正确方法.
I built a custom collector for Lucene.Net, but I can't figure out how to order (or page) the results. Everytime Collect gets called, I can add the result to an internal PriorityQueue, which I understand is the correct way to do this.
我扩展了PriorityQueue,但是它在创建时需要一个size参数.您必须在构造函数中调用Initialize并传入最大大小.
I extended the PriorityQueue, but it requires a size parameter on creation. You have to call Initialize in the constructor and pass in the max size.
但是,在收集器中,搜索器只是在获取新结果时调用Collect,所以我不知道创建PriorityQueue时有多少结果.基于此,我不知道如何使PriorityQueue工作.
However, in a collector, the searcher just calls Collect when it gets a new result, so I don't know how many results I have when I create the PriorityQueue. Based on this, I can't figure out how to make the PriorityQueue work.
我意识到我可能在这里错过了一些简单的事情...
I realize I'm probably missing something simple here...
推荐答案
PriorityQueue不是SortedList
或SortedDictionary
.这是一种排序实现,它返回N个元素的前M个结果(您的PriorityQueue的大小).您可以根据需要添加InsertWithOverflow
任意数量的项,但它仅包含前M个元素.
PriorityQueue is not SortedList
or SortedDictionary
.It is a kind of sorting implementation where it returns the top M results(your PriorityQueue's size) of N elements. You can add with InsertWithOverflow
as many items as you want, but it will only hold only the top M elements.
假设您的搜索结果是1000000次匹配.您会将所有结果返回给用户吗?更好的方法是将前10个元素返回给用户(使用PriorityQueue(10)
),然后如果用户要求下一个10个结果,则可以使用PriorityQueue(
20
)
进行新搜索,并返回下一个 10 元素,依此类推.这是大多数搜索引擎(如Google)使用的技巧.
Suppose your search resulted in 1000000 hits. Would you return all of the results to user?A better way would be to return the top 10 elements to the user(using PriorityQueue(10)
) andif the user requests for the next 10 result, you can make a new search with PriorityQueue(
20
)
and return the next 10 elements and so on.This is the trick most search engines like google uses.
Everytime Commit gets called, I can add the result to an internal PriorityQueue
.
我无法理解Commit
和search
之间的关系,因此,我将附加PriorityQueue的示例用法:
I can not undestand the relationship between Commit
and search
, Therefore I will append a sample usage of PriorityQueue:
public class CustomQueue : Lucene.Net.Util.PriorityQueue<Document>
{
public CustomQueue(int maxSize): base()
{
Initialize(maxSize);
}
public override bool LessThan(Document a, Document b)
{
//a.GetField("field1")
//b.GetField("field2");
return //compare a & b
}
}
public class MyCollector : Lucene.Net.Search.Collector
{
CustomQueue _queue = null;
IndexReader _currentReader;
public MyCollector(int maxSize)
{
_queue = new CustomQueue(maxSize);
}
public override bool AcceptsDocsOutOfOrder()
{
return true;
}
public override void Collect(int doc)
{
_queue.InsertWithOverflow(_currentReader.Document(doc));
}
public override void SetNextReader(IndexReader reader, int docBase)
{
_currentReader = reader;
}
public override void SetScorer(Scorer scorer)
{
}
}
searcher.Search(query,new MyCollector(10)) //First page.
searcher.Search(query,new MyCollector(20)) //2nd page.
searcher.Search(query,new MyCollector(30)) //3rd page.
编辑@nokturnal
public class MyPriorityQueue<TObj, TComp> : Lucene.Net.Util.PriorityQueue<TObj>
where TComp : IComparable<TComp>
{
Func<TObj, TComp> _KeySelector;
public MyPriorityQueue(int size, Func<TObj, TComp> keySelector) : base()
{
_KeySelector = keySelector;
Initialize(size);
}
public override bool LessThan(TObj a, TObj b)
{
return _KeySelector(a).CompareTo(_KeySelector(b)) < 0;
}
public IEnumerable<TObj> Items
{
get
{
int size = Size();
for (int i = 0; i < size; i++)
yield return Pop();
}
}
}
var pq = new MyPriorityQueue<Document, string>(3, doc => doc.GetField("SomeField").StringValue);
foreach (var item in pq.Items)
{
}
这篇关于当我在创建时不知道最大大小时,如何使用Lucene的PriorityQueue?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!