问题描述
词干是标记系统所需要的.我用delicious,我没有时间管理和修剪我的标签.我对我的博客更加小心,但它并不完美.我为嵌入式系统编写软件,如果它们包含词干提取功能,它们会更加实用(对用户有帮助).
Stemming is something that's needed in tagging systems. I use delicious, and I don't have time to manage and prune my tags. I'm a bit more careful with my blog, but it isn't perfect. I write software for embedded systems that would be much more functional (helpful to the user) if they included stemming.
例如:
解析
解析器
解析
For instance:
Parse
Parser
Parsing
对于我将它们放入的任何系统都应该具有相同的含义.
Should all mean the same thing to whatever system I'm putting them into.
理想情况下,某个地方有一个获得 BSD 许可的词干分析器,但如果没有,我在哪里可以学习常用的算法和技术?
Ideally there's a BSD licensed stemmer somewhere, but if not, where do I look to learn the common algorithms and techniques for this?
除了 BSD 词干分析器,还有哪些其他开源许可词干分析器?
Aside from BSD stemmers, what other open source licensed stemmers are out there?
-亚当
推荐答案
Snowball 词干分析器(C & Java)我用过它的 Python 绑定,PyStemmer
Snowball stemmer (C & Java)I've used it's Python binding, PyStemmer
这篇关于Stemming - 代码示例或开源项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!