本文介绍了使用Soundex,Python替换单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个句子列表,基本上我的目的是用正确的拼写对面,在附近,在上,在后面"替换所有以"opp,nr,off,abv,behnd"形式出现的差异.在.单词的soundex代码是相同的,因此我需要构建一个表达式来逐字地遍历此列表,如果soundex相同,请用正确的拼写替换它.

i have a list of sentences and basically my aim is to replace all diff occurrences of prepositions in the form "opp,nr,off,abv,behnd" with their correct spellings "opposite,near,above,behind" and so on. The soundex code of the words are same so i need to build an expression to iterate over this list word by word and if the soundex is same, replace it with the right spelling.

一个例子- ['杰克站在树上,
他们没有他计划的一切",
只要站在柜台前",
'去加油站去吧']

An example - ['Jack was standing nr the tree' ,
'they were abv everything he planned' ,
'Just stand opp the counter' ,
'Go twrds the gas station']

所以我需要用正确的完整形式替换单词nr,abv,opp和twrds.对和twrds的soundex代码相同,因此应将其替换.
我需要遍历此列表..
这是soundex算法:

so i need to replace words nr,abv ,opp and twrds with their right full forms. The soundex code of towards and twrds is the same , so it should be replaced.
i need to iterate over this list..
here's the soundex algorithm :

import string

allChar = string.uppercase + string.lowercase
charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2)

def soundex(source):
    "convert string to Soundex equivalent"

    # Soundex requirements:
    # source string must be at least 1 character
    # and must consist entirely of letters
    if (not source) or (not source.isalpha()):
    return "0000"

    # Soundex algorithm:
    # 1. make first character uppercase
    # 2. translate all other characters to Soundex digits
    digits = source[0].upper() + source[1:].translate(charToSoundex)

    # 3. remove consecutive duplicates
    digits2 = digits[0]
    for d in digits[1:]:
        if digits2[-1] != d:
           digits2 += d

    # 4. remove all "9"s
    # 5. pad end with "0"s to 4 characters
    return (digits2.replace('9', '') + '000')[:4]

if __name__ == '__main__':
   import sys
   if sys.argv[1:]:
      print soundex(sys.argv[1])
   else:
    from timeit import Timer
    names = ('Woo', 'Pilgrim', 'Flingjingwaller')
    for name in names:
        statement = "soundex('%s')" % name
        t = Timer(statement, "from __main__ import soundex")
        print name.ljust(15), soundex(name), min(t.repeat())

是一个新手,所以如果您有其他建议,不胜感激..谢谢.

am a newbie ,so in case there's another approach you could suggest , it would be appreciated.. thanks.

推荐答案

我将使用附魔模块:

import enchant
d = enchant.Dict("en_US")

phrase = ['Jack was standing nr the tree' ,
'they were abv everything he planned' ,
'Just stand opp the counter' ,
'Go twrds the gas station']

output = []
for section in phrase:
    sect = ''
    for word in section.split():
        if d.check(word):
            sect += word + ' '
        else:
            for correct_word in d.suggest(word):
                if soundex(correct_word) == soundex(word):
                    sect +=  correct_word + ' '
    output.append(sect[:-1])

这篇关于使用Soundex,Python替换单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 18:30