我的下一个问题是,我有一个300,000多个单词的列表,我想找到 每对这样的字符串。我想我会首先按照字符串的长度进行排序,但是我如何迭代以下内容: str1 str2 str3 str4 str5 所以我比较str1& str2,str1& str3,str 1& str4,str1& str5, str2& str3,str3& str4,str3& str5,str4& str5。 提前致谢, 马修 解决方案 >是否有一种聪明的方法来查看两个相同长度的字符串是否因只有一个字符而变化,以及两个字符串中的字符是什么。 例如str1 = yaqtil str2 = yaqtel 它们在str1 [4]上有所不同,差别是(''我',''e'') 但如果有的话str1 = yiqtol和str2 = yaqtel,我不感兴趣。 任何人都可以提出一个简单的方法吗? 使用levenshtein距离。 http://en.wikisource.org/wiki/Levenshtein_distance 我的下一个问题是,我有一个300,000多个单词的列表,我想找到每一对这样的字符串。我想我会首先按照字符串的长度排序,但是如何重复以下内容: str1 str2 str3 str4 str5 这样我比较str1& str2,str1& str3,str 1& str4,str1& str5, str2& str3,str3& str4,str3& str5,str4& str5。 decorate-sort-undecorate是这个的目标 l =<字符串列表> l = [(len(w),w)for w in l] l.sort() l = [w for _ ,w in l] Diez manstey写道: 是否有一种聪明的方法来查看相同长度的两个字符串是否只有一个字符变化,以及两个字符串中的字符是什么。 例如str1 = yaqtil str2 = yaqtel 它们在str1 [4]上有所不同,差别是(''我',''e'') 这样的事可能吗? str1 =''yaqtil'' str2 = ''yaqtel'' set(enumerate(str1))^ set(enumerate(str2)) set([(4,''e''),(4,''i'') ]) - - 贾斯汀 manstey写道: 是否有一种聪明的方法来查看相同长度的两个字符串是否因而异字符,以及两个字符串中的字符。 例如str1 = yaqtil str2 = yaqtel 它们在str1 [4]上有所不同,差别是(''我',''e'') 但如果有的话str1 = yiqtol和str2 = yaqtel,我不感兴趣。 任何人都可以提出一个简单的方法吗? 我的下一个问题是,我有一个300,000的清单+单词,我想找到每对这样的字符串。我想我会首先按照字符串的长度排序,但是如何重复以下内容: str1 str2 str3 str4 str5 这样我比较str1& str2,str1& str3,str 1& str4,str1& str5, str2& str3,str3& str4,str3& str5,str4& str5。 如果你的字符串非常短,你可以像这样做,即使没有 先按长度排序: def fuzzy_keys(s): 为范围内的pos(len(s)): 收益率s [0:pos] + chr( 0)+ s [pos + 1:] def fuzzy_insert(d,s):对于fuzzy_keys中的fuzzy_key,: 如果d中有fuzzy_key: strings = d [fuzzy_key] 如果type(字符串)是list: strings + = s else: d [fuzzy_key] = [strings,s] else: d [fuzzy_key] = s def gather_fuzzy_matches(d): 表示d.itervalues()中的字符串:如果type(字符串)是清单: 收益率字符串 acc = {} fuzzy_insert(acc," yaqtel") fuzzy_insert(acc," yaqtil") fuzzy_insert(acc," oaqtil") 打印列表(gather_fuzzy_matches(acc)) b $ b打印 [[''yaqt il'',''oaqtil''],[''yaqtel'',''yaqtil'']] Hi,Is there a clever way to see if two strings of the same length vary byonly one character, and what the character is in both strings.E.g. str1=yaqtil str2=yaqtelthey differ at str1[4] and the difference is (''i'',''e'')But if there was str1=yiqtol and str2=yaqtel, I am not interested.can anyone suggest a simple way to do this?My next problem is, I have a list of 300,000+ words and I want to findevery pair of such strings. I thought I would first sort on length ofstring, but how do I iterate through the following:str1str2str3str4str5so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5,str2 & str3, str3 & str4, str3 & str5, str4 & str5.Thanks in advance,Matthew 解决方案 > Is there a clever way to see if two strings of the same length vary by only one character, and what the character is in both strings. E.g. str1=yaqtil str2=yaqtel they differ at str1[4] and the difference is (''i'',''e'') But if there was str1=yiqtol and str2=yaqtel, I am not interested. can anyone suggest a simple way to do this?Use the levenshtein distance. http://en.wikisource.org/wiki/Levenshtein_distance My next problem is, I have a list of 300,000+ words and I want to find every pair of such strings. I thought I would first sort on length of string, but how do I iterate through the following: str1 str2 str3 str4 str5 so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5, str2 & str3, str3 & str4, str3 & str5, str4 & str5.decorate-sort-undecorate is the idion for thisl = <list of strings>l = [(len(w), w) for w in l]l.sort()l = [w for _, w in l]Diezmanstey wrote: Hi, Is there a clever way to see if two strings of the same length vary by only one character, and what the character is in both strings. E.g. str1=yaqtil str2=yaqtel they differ at str1[4] and the difference is (''i'',''e'')something like this maybe? str1=''yaqtil'' str2=''yaqtel'' set(enumerate(str1)) ^ set(enumerate(str2))set([(4, ''e''), (4, ''i'')])--- Justin manstey wrote: Hi, Is there a clever way to see if two strings of the same length vary by only one character, and what the character is in both strings. E.g. str1=yaqtil str2=yaqtel they differ at str1[4] and the difference is (''i'',''e'') But if there was str1=yiqtol and str2=yaqtel, I am not interested. can anyone suggest a simple way to do this? My next problem is, I have a list of 300,000+ words and I want to find every pair of such strings. I thought I would first sort on length of string, but how do I iterate through the following: str1 str2 str3 str4 str5 so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5, str2 & str3, str3 & str4, str3 & str5, str4 & str5.If your strings are pretty short you can do it like this even withoutsorting by length first:def fuzzy_keys(s):for pos in range(len(s)):yield s[0:pos]+chr(0)+s[pos+1:]def fuzzy_insert(d, s):for fuzzy_key in fuzzy_keys(s):if fuzzy_key in d:strings = d[fuzzy_key]if type(strings) is list:strings += selse:d[fuzzy_key] = [strings, s]else:d[fuzzy_key] = sdef gather_fuzzy_matches(d):for strings in d.itervalues():if type(strings) is list:yield stringsacc = {}fuzzy_insert(acc, "yaqtel")fuzzy_insert(acc, "yaqtil")fuzzy_insert(acc, "oaqtil")print list(gather_fuzzy_matches(acc))prints[[''yaqtil'', ''oaqtil''], [''yaqtel'', ''yaqtil'']] 这篇关于newb:comapring两个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-12 11:50