我可以在不生成TokenStream的情况下将文档插入Lucen

我可以在不生成TokenStream的情况下将文档插入Lucen

本文介绍了我可以在不生成TokenStream的情况下将文档插入Lucene吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法通过直接提供术语和术语频率而不是通过Analysis和/或TokenStream将文档添加到索引?我问,因为我想模拟一些我知道术语频率的数据,但是没有要分析的基础文本文档。我可以通过多次重复相同的术语来创建一个(我不关心在这种情况下的位置或突出显示,只是得分),但这似乎有点不正常(并且可能比直接提供计数更慢)。

Is there a way to add a document to the index by supplying terms and term frequencies directly, rather than via Analysis and/or TokenStream? I ask because I want to model some data where I know the term frequencies, but there is no underlying text document to be analyzed. I could create one by repeating the same term many times (I don't care about positions or highlighting in this case, either, just scoring), but that seems a bit perverse (and probably slower than just supplying the counts directly).

(也在邮件列表上询问)

(also asked on the mailing list)

推荐答案

任何rate,您无需通过Analyzer传递所有内容即可创建文档。我不知道有任何方法可以通过你所要求的条款和频率(虽然我有兴趣知道你是否找到了一个好的方法),但你当然可以通过 IndexableFields 一次一个词。这仍然需要您多次添加每个术语,例如:

At any rate, you don't need to pass everything through an Analyzer in order to create the document. I'm not aware of any way to pass in Terms and Frequencies as you've asked (though I'd be interested to know if you find a good approach to it), but you can certainly pass in IndexableFields one term at a time. That would still require you to add each term multiple times, like:

IndexableField field = new StringField(fieldName, myTerm, FieldType.TYPE_NOT_STORED);
for (int i = 0; i < frequency; i++) {
    document.add(field);
}

您还可以退一步,削减文件完全通过使用任何 Iterable< IndexableField> ,一个简单的列表,例如,这可能足以用于更直接的数据建模方法。

You can also take a step further back, and cut the Document class out entirely, by using any Iterable<IndexableField>, a simple List, for instance, which might suffice for a more direct approach for modelling your data.

不确定是否能让您更接近您所寻找的内容,但也许隐约朝着正确方向迈出的一步。

Not sure if that gets you any closer to what you are looking for, but perhaps a step vaguely in the right direction.

这篇关于我可以在不生成TokenStream的情况下将文档插入Lucene吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 12:57