本文介绍了Python 文件索引和搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要启用搜索的大文件 (hdf).对于 Java,我会为此使用 Lucene,因为它是一个文件和文档索引引擎.我不知道 python 等价物是什么.

I have a large set off files (hdf) that I need to enable search for. For Java I would use Lucene for this, as it's a file and document indexing engine. I don't know what the python equivalent would be though.

谁能推荐我应该使用哪个库来索引大量文件以进行快速搜索?或者是自己动手的首选方式?

Can anyone recommend which library I should use for indexing a large collection of files for fast search? Or is the prefered way to roll your own?

我查看了 pylucenelupy,但两个项目似乎都相当不活跃且不受支持,所以我不确定是否应该依赖它们.

I have looked at pylucene and lupy, but both projects seem rather inactive and unsupported, so I am not sure if should rely on them.

最后说明:Woosh 和 pylucene 看起来很有希望,但 woosh 仍然是 alpha 版本,所以我不确定我是否想要依赖它,而且我在编译 pylucene 时遇到了问题,并且没有实际发布它.在我更多地查看数据之后,它主要是数字和默认文本字符串,因此现在索引引擎对我无济于事.希望这些库会稳定下来,以后的访问者会发现它们有一些用处.

Final notes:Woosh and pylucene seems promising, but woosh is still alpha so I am not sure I want to rely on it, and I have problems compiling pylucene, and there are no actual releases off it. After I have looked a bit more at the data, it's mostly numbers and default text strings, so as off now an indexing engine won't help me. Hopefully these libraries will stabilize and later visitors will find some use for them.

推荐答案

Lupy 已退休并且开发人员推荐 PyLucene.至于PyLucene,它的邮件列表活跃度可能较低,但肯定是支持的.事实上,它最近才成为官方apache子项目.

Lupy has been retired and the developers recommend PyLucene instead. As for PyLucene, its mailing list activity may be low, but it is definitely supported. In fact, it just recently became an official apache subproject.

您可能还想查看一个新的竞争者:Whoosh.它类似于 lucene,但在纯 python 中实现.

You may also want to look at a new contender: Whoosh. It's similar to lucene, but implemented in pure python.

这篇关于Python 文件索引和搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 10:54