python - 从字符串中获取所有可能的英语单词

从python中的给定字符串生成英语单词的所有可能组合。

输入：godaddy
输出：去，上帝，爸爸，加，爸爸

有好的图书馆吗？

最佳答案

从http://pythonhosted.org/pyenchant/tutorial.html尝试enchant

>>> from nltk import everygrams
>>> import enchant
>>> word = 'godaddy'
>>> [''.join(_ngram) for _ngram in everygrams(word) if d.check(''.join(_ngram))]
['g', 'o', 'd', 'a', 'd', 'd', 'y', 'go', 'ad', 'god', 'dad', 'add', 'daddy']
>>> d = enchant.Dict("en_US")
# Exclude single char words.
>>> [''.join(_ngram) for _ngram in everygrams(word) if d.check(''.join(_ngram)) and len(_ngram) > 1]
['go', 'ad', 'god', 'dad', 'add', 'daddy']

但是，如果它是字符串的所有组合，则不管它是否是有效的英语单词：

>>> list(everygrams(word))

也可以看看：

n-grams in python, four, five, six grams?
Generating Ngrams (Unigrams,Bigrams etc) from a large corpus of .txt files and their Frequency
extracting n grams from huge text
Fast/Optimize N-gram implementations in python
How to compute skipgrams in python?

注意

任何字典检查方法都有其局限性：

>>> from nltk.corpus import words as english
>>> vocab = set(w.lower() for w in english.words())
>>> "google" in vocab
False
>>> "stackoverflow" in vocab
False

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check('StackOverflow')
False
>>> d.check('Stackoverflow')
False
>>> d.check('Google')
True

执行此任务的“原则”方法是在字符级别进行语言建模，并具有一些概率方法来检查字符序列是否像英语单词一样大/小。

另外，世界上有许多英语。英式英语中的“有效”单词可能是美式英语中的未知单词。请参见http://www.ucl.ac.uk/english-usage/projects/ice.htm和https://en.wikipedia.org/wiki/World_Englishes#Classification_of_Englishes

关于python - 从字符串中获取所有可能的英语单词，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/43159959/