trying在Python的difflib包中使用SequenceMatcher方法来识别字符串相似性。不过,我在使用该方法时遇到了奇怪的行为,并且我认为我的问题可能与程序包的“垃圾”过滤器有关,该问题在here中进行了详细描述。可以说我以为我可以通过以difflib documentation描述的方式将autojunk标志传递给SequenceMatcher来解决问题:

import difflib

def matches(s1, s2):
    s = difflib.SequenceMatcher(None, s1, s2, autojunk=False)
    match = [s1[i:i+n] for i, j, n in s.get_matching_blocks() if n > 0]
    return match

print matches("they all are white a sheet of spotless paper when they first are born but they are to be scrawled upon and blotted by every goose quill", "you are all white a sheet of lovely spotless paper when you first are born but you are to be scrawled and blotted by every gooses quill")


但这会产生以下错误消息:

Traceback (most recent call last):
  File "test3.py", line 8, in <module>
    print matches("they all are white a sheet of spotless paper when they first are born but they are to be scrawled upon and blotted by every goose quill", "you are all white a sheet of lovely spotless paper when you first are born but you are to be scrawled and blotted by every gooses quill")
  File "test3.py", line 4, in matches
    s = difflib.SequenceMatcher(None, s1, s2, autojunk=False)
TypeError: __init__() got an unexpected keyword argument 'autojunk'


有谁知道我如何将autojunk = False标志传递给SequenceMatcher?我将非常感谢其他人可以提供的任何建议。

最佳答案

根据SequenceMatcher documentation


  可选参数autojunk可用于禁用自动垃圾启发式。
  
  2.7.1版中的新功能:autojunk参数。


升级到Python 2.7.1+以使用autojunk参数。

10-06 13:52