问题描述
我在 Ubuntu 10.04(通过 apt-get solr-tomcat 安装)上运行 Solr 1.4,它似乎工作正常.不过,我很难找到有关如何索引文档的任何连贯信息.我是 SOLR 的新手,所以请耐心等待!我有一个文件夹 (/mnt/folder),它是一个挂载的 Windows 共享,其中包含我想要索引的 Word 和 PDF 文件,让 SOLR 索引整个文件夹的最简单方法是什么?
Im running Solr 1.4 on Ubuntu 10.04 (installed via apt-get solr-tomcat) and it seems to be working fine. Im having some difficulty finding any coherent info on how to index documents though. Im new to SOLR so bear with me!I have a folder (/mnt/folder) that is a mounted windows share, which contains Word and PDF files that I would like indexed, whats the easiest way to get SOLR to index the entire folder?
SOLR 的文档非常糟糕,不可能找到任何有关如何完成工作的体面教程,因此非常感谢任何帮助!
The documentation for SOLR is pretty poor, its impossilbe to find any decent tutorials on getting things done with it so any help is greatly appreciated!
S
推荐答案
查看 Solr wiki,这是一个非常详尽的文档.
Take a look at the Solr wiki, it's a pretty thorough documentation.
特别是查看 ExtractingRequestHandler,它允许您索引二进制文件,如 Word 和 PDF 文档.这是该主题的简介.
In particular see the ExtractingRequestHandler, which allows you to index binary files like Word and PDF documents. Here's an introduction to the topic.
如果维基对你来说还不够,还有一个关于 Solr 的好书.
If the wiki isn't enough for you, there's also a great book about Solr.
这篇关于如何在 SOLR 中索引文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!