This question already has answers here:
How to find the overlap between 2 sequences, and return it

(4 个回答)


4年前关闭。




我有两个字符串对的例子
YHFLSPYVY      # answer
   LSPYVYSPR   # prediction
+++******ooo


  YHFLSPYVS    # answer
VEYHFLSPY      # prediction
oo*******++

如上所述,我想在答案( * )和预测( + )中找到重叠区域( o )和非重叠区域。

我怎样才能在 Python 中做到这一点?

我被这个困住了
import re
# This is of example 1
ans = "YHFLSPYVY"
pred= "LSPYVYSPR"
matches = re.finditer(r'(?=(%s))' % re.escape(pred), ans)
print [m.start(1) for m in matches]
#[]

例如,我希望得到的答案是:
plus_len = 3
star_len = 6
ooo_len = 3

最佳答案

使用 difflib.SequenceMatcher.find_longest_match 很容易:

from difflib import SequenceMatcher

def f(answer, prediction):
    sm = SequenceMatcher(a=answer, b=prediction)
    match = sm.find_longest_match(0, len(answer), 0, len(prediction))
    star_len = match.size
    return (len(answer) - star_len, star_len, len(prediction) - star_len)

该函数返回一个三元组整数 (plus_len, star_len, ooo_len) :
f('YHFLSPYVY', 'LSPYVYSPR') -> (3, 6, 3)
f('YHFLSPYVS', 'VEYHFLSPY') -> (2, 7, 2)

关于python - 查找两个字符串的接触部分和非接触部分,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38613045/

10-12 20:09