问题描述
我对Solr很新,我正在评估它。我的任务是在书籍库中查找单词,并在小范围内将它们返回 。到目前为止,我将书籍存储在按段落分割的数据库中(通过换行来切分书籍),我执行全文搜索并返回行。在Solr,我是否也必须这样做,还是可以添加整本书(采用.txt格式),并且每当找到匹配项时,就会返回类似于匹配的内容加上前100个单词和后面100个单词之类的内容?谢谢
突出显示会执行您的出价。
以下是您的相关选项:
hl.snippets
突出显示的片段的最大数量,以生成每个字段.....
hl.fragsize
由荧光笔创建的片段(又名片段)的大小(以字符为单位) .....
默认值是100。
hl.mergeContiguous
将连续片段合并为一个片段....
对于你所描述的内容,将它设置为返回5(或者任何人可以正确处理的)来自 text
字段的片段,其中 hl.fl
;每个片段的长度在单词/短语周围400个字符(我近似为100个单词)。
另见 hl.regex.slop
用于在短语周围构建片段, hl.simple .pre / hl.simple.post
用于标记。
I'm very new to Solr and I'm evaluating it. My task is to look for words within a corpus of books and return them within a small context. So far, I'm storing the books in a database split by paragraphs (slicing the books by line breaks), I do a fulltext search and return the row.
In Solr, would I have to do the same, or can I add the whole book (in .txt format) and, whenever a match is found, return something like the match plus 100 words before and 100 words after or something like that? Thanks
Highlighting will do your bidding. http://wiki.apache.org/solr/HighlightingParameters
Here are relevant options for you:
hl.snippets
The maximum number of highlighted snippets to generate per field.....
hl.fragsize
The size, in characters, of the snippets (aka fragments) created by the highlighter.....
The default value is "100".
hl.mergeContiguous
Collapse contiguous fragments into a single fragment....
For what you describe, set it to return 5 (or whatever a human can sanely handle) snippets from text
field with hl.fl
; the length of each snippet 400 characters (my approximation of 100 words) around the word/phrase.
See also hl.regex.slop
for building snippets around phrases and hl.simple.pre/hl.simple.post
for markup.
这篇关于在Apache Solr中搜索书籍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!