如何从给定的hunspell词典中获取所有可能的单词

如何从给定的hunspell词典中获取所有可能的单词

本文介绍了如何从给定的hunspell词典中获取所有可能的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析支持hunspell格式的affdic文件的开放办公室.

I would like to parse open office supporting hunspell formatted aff and dic files.

英语affdic文件: http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice

我想扫描给定.dic文件的每一行,并使用提供的.aff文件生成每一行的每个可能的单词

I want to scan each line of the given .dic file and generate every possible word of the each line with the provided .aff file

我该怎么做?

我已经安装了NHunspell框架,但没有该功能: https://www.nuget .org/packages/NHunspell/

I have installed NHunspell framework but it does not have that feature : https://www.nuget.org/packages/NHunspell/

以英语为例,请考虑

make/UAGS

make可以是make, made, makes, making

现在我需要解析器来给我所有这些组合.我如何获得它们?真的很

Now i need parser to give me all these combinations. How can i obtain them? Ty very much

所以基本上我想扫描字典的每一行并从该行的单词中生成所有可能的单词,我不知道该怎么做

So basically i want to scan each line of the dictionary and generate all possible words from the word of that line and i dont know how can i do that

我也可以编写自己的解析器,但是在我看来规则非常复杂,没有关于此的详细且简单的文档

I can also write my own parsers, but it seems to me rules are pretty complex and there are no detailed and easy documentation about this

这基本上是我想要的.图像解释得很清楚

Here what i want basically. The image explains very clearly

提供analyze/ADSGen.dicen.aff文件并获得以下所有单词

Giving analyze/ADSG, en.dic and en.aff file and obtaining all the following words

analyze, analyzes, analyzing, analyzed, reanalyze, reanalyzes, reanalyzing, reanalyzed

推荐答案

如果您想要整个数据库,则可以执行unmunch:

If you want the entire database you may execute unmunch:

unmunch dictionary.dic dictionary.aff

请注意,当前在hunspell中实施unmunch的操作限制为最大单词数,affs和所生成单词的长度.因此,如果目标语言超出了取消限制的范围,则取消锁定可能会失败.

Note that the current implementation of unmunch in hunspell has a limitation of maximum number of words, affs, and length of generated words. So, unmunch may fail if the target language is beyond the limits of unmunch.

如果只想从一个条目中生成可能单词的列表,则可以使用wordforms:

If you want just the list of possible words that can be generated from an entry, you may use wordforms:

wordforms dictionary.aff dictionary.dic word

这篇关于如何从给定的hunspell词典中获取所有可能的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 20:04