问题描述
给出2个字符串 s
和 t
。我需要为 s
中的每个子字符串找到编辑距离(Levenshtein距离)到 t
。实际上,我需要知道 s
中每个 i
位置,从位置 i 。
Given 2 strings s
and t
. I need to find for each substring in s
edit distance(Levenshtein distance) to t
. Actually I need to know for each i
position in s
what is the minimum edit distance for all substrings started at position i
.
例如:
t = "ab"
s = "sdabcb"
需要得到类似的东西:
{2,1,0,2,2}
说明:
1st position:
distance("ab", "sd") = 4 ( 2*subst )
distance("ab", "sda") = 3( 2*delete + insert )
distance("ab", "sdab") = 2 ( 2 * delete)
distance("ab", "sdabc") = 3 ( 3 * delete)
distance("ab", "sdabcb") = 4 ( 4 * delete)
So, minimum is 2
2nd position:
distance("ab", "da") = 2 (delete + insert)
distance("ab", "dab") = 1 (delete)
distance("ab", "dabc") = 2 (2*delete)
....
So, minimum is 1
3th position:
distance("ab", "ab") = 0
...
minimum is 0
我当然可以使用蛮力算法来解决此任务。但是有更快的算法吗?
I can use brute force algorithm to solve this task, of course. But is there faster algorithm?
感谢帮助。
推荐答案
Wagner-Fischer算法可以您可以免费获得所有前缀的答案。
The Wagner-Fischer algorithm gives you the answer for all prefixes "for free".
Wagner-Fischer矩阵包含从 s
的每个前缀到 t
的编辑距离。
The last row of the Wagner-Fischer matrix contains the edit distance from each prefix of s
to t
.
首先要解决您的问题,对于每个 i
,运行Wagner-Fischer并选择最后一行中最小的元素。
So as a first crack at your problem, for each i
, run Wagner-Fischer and select the smallest element in the last row.
我很想知道是否有人知道(或可以找到)更好的方法。
I will be curious to see if anyone else knows (or can find) a better approach.
这篇关于查找到所有子字符串的编辑距离的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!