使用递归和DP的最长公共子串

本文介绍了使用递归和DP的最长公共子串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用递归和DP查找两个字符串的最长公共子字符串.请注意，我指的不是最长连续子序列.所以，如果两个字符串是

I'm trying to find the Longest Common Substring of two strings using Recursion and DP. Please note that I'm not referring to Longest Contiguous subsequence. So, if the two strings were

String s1 = "abcdf"; String s2 = "bzcdf"
Longest Common Substring == "cdf" (not "bcdf").
Basically they have to be continuous elements

我正在尝试使用递归和回溯.但是，问题是，如果我使用如下所示的递归，则 +1 会被添加到帧的前面，即在调用堆栈中的较高位置，并且不知道是否要输入字符确实是连续的要素还是没有.因此，按照上面的示例，"bcdf"将是答案.

I am trying to do this using recursion and backtracking. However, the problem is that if I use a recursion such as below, the +1 are added upfront in a frame, that is higher up in the call stack, and unaware of whether the characters to come are indeed continuous elements or no. And so, going by the example above, "bcdf" would be the answer.

public class ThisIsLongestCommonSubsequence_NotSubstring {
public static void main(String[] args) {

    String s1 = "abcdgh";
    String s2 = "abefgh";
    System.out.println(fun(s1, s1.length()-1, s2, s2.length()-1));
}

static int fun(String s1, int i, String s2, int j)
{
    if(i == -1 || j == -1)
        return 0;

    int ret = 0;
    if(s1.charAt(i) == s2.charAt(j))
        ret = fun(s1, i-1, s2, j-1) + 1;
    else
        ret = max(fun(s1, i-1, s2, j), fun(s1, i, s2, j-1));

    return ret;
}

static int max(int a, int b)
{
    return a>b?a:b;
}
}

就目前而言，下面的代码是我想出的.请注意，每次发现不匹配时，如何将计数重置为0.并使用名为 int count 的变量跟踪匹配字符的数量，并使用名为 int maxcount 的变量记录程序中任何位置的最高字符数.我的代码如下.

As for now, the code below is what I have come up with. Note how, I reset the count to 0, every time I find a mismatch. And keep track of the number of matching characters using a variable called int count, and record the highest at any point in program using a variable called int maxcount. My code below.

public class LongestContinuousSubstringGlobalvariable {

static int maxcount = 0;

public static void main(String[] args) {
    String s1 = "abcdghijl";
    String s2 = "abefghijk";

    fun(s1, s2, s1.length()-1, s2.length()-1, 0);
    System.out.println("maxcount == "+maxcount);
}

static void fun(String s1, String s2, int i, int j, int count)
{
    if(i == -1 || j==-1)
        return;

    if(s1.charAt(i) == s2.charAt(j))
    {
        if(count+1 >  maxcount)
            maxcount = count+1;
        fun(s1, s2, i-1, j-1, count+1);
    }
    else
    {
        fun(s1, s2, i-1, j, 0);
        fun(s1, s2, i, j-1, 0);
    }
}
}

这很好.但是，我对代码有些不满意的地方

This works fine. However, there are couple of things I don't like about my code

使用全局变量(静态int maxcount)在各帧之间进行比较
我认为这不是真正的动态编程或回溯，因为较低的帧不会返回，它会输出到较高的帧，然后再决定如何处理.

Use of the global variable (static int maxcount) to compare across frames
I don't think this is real dynamic programming or backtracking, since the lower frame is not returning it's output to a higher frame, which then decides what to do with it.

请提供您关于如何在不使用全局变量和使用回溯的情况下实现此目标的意见.

Please give me your inputs on how I can achieve this without the use of the global variable, and using backtracking.

PS:我知道解决该问题的其他方法，例如保持矩阵状态以及进行类似

PS : I am aware of other approaches to the problem, like keeping a matrix, and doing something like

目标不是解决问题，而是找到一种优雅的递归/回溯解决方案.

The objective is not to solve the problem, but to find an elegant recursive/backtracking solution.

推荐答案

可以在Prolog中完成.以下是我可以在本文中提供帮助的代码: Foreach无法在Prolog中使用， http://obvcode.blogspot.in/2008/11/working-with-strings-in-prolog.html 和

It could probably be done in Prolog. Following is the code which I could put down with help from this post: Foreach not working in Prolog , http://obvcode.blogspot.in/2008/11/working-with-strings-in-prolog.html and How do I find the longest list in a list of lists?

myrun(S1, S2):-
    writeln("-------- codes of first string ---------"),
    string_codes(S1, C1list),
    writeln(C1list),

    writeln("-------- codes of second string ---------"),
    string_codes(S2, C2list),
    writeln(C2list),

    writeln("--------- substrings of first --------"),
    findall(X, sublist(X, C1list), L),
    writeln(L),

    writeln("--------- substrings of second --------"),
    findall(X, sublist(X, C2list), M),
    writeln(M),

    writeln("------ codes of common substrings -------"),
    intersection(L,M, Outl),
    writeln(Outl),

    writeln("--------- common strings in one line -------"),
    maplist(string_codes, Sl, Outl),
    writeln(Sl),
    writeln("------ common strings one by one -------"),
    maplist(writeln, Sl),

    writeln("------ find longest -------"),
    longest(Outl, LongestL),
    writeln(LongestL),
    string_codes(LongestS, LongestL),
    writeln(LongestS).

sublist(S, L) :-
  append(_, L2, L),
  append(S, _, L2).

longest([L], L) :-
   !.
longest([H|T], H) :-
   length(H, N),
   longest(T, X),
   length(X, M),
   N > M,
   !.
longest([H|T], X) :-
   longest(T, X),
   !.

它会显示所有步骤:将字符串转换为代码，然后从这两个字符串中提取所有可能的子字符串，然后找到常见的子字符串并列出它们:

It runs showing all the steps: It convert strings to codes, then make all possible substrings from both, then find those which are common and lists them:

?- myrun("abcdf", "bzcdf").
-------- codes of first string ---------
[97,98,99,100,102]
-------- codes of second string ---------
[98,122,99,100,102]
--------- substrings of first --------
[[],[97],[97,98],[97,98,99],[97,98,99,100],[97,98,99,100,102],[],[98],[98,99],[98,99,100],[98,99,100,102],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
--------- substrings of second --------
[[],[98],[98,122],[98,122,99],[98,122,99,100],[98,122,99,100,102],[],[122],[122,99],[122,99,100],[122,99,100,102],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
------ codes of common substrings -------
[[],[],[98],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
--------- common strings in one line -------
[,,b,,c,cd,cdf,,d,df,,f,]
------ common strings one by one -------


b

c
cd
cdf

d
df

f

------ find longest -------
[99,100,102]
cdf
true.

最后忽略"true".

Ignore the 'true' at end.

如果删除了说明部分，则程序要短得多:

If explanatory parts are removed, program is much shorter:

myrun(S1, S2):-
    string_codes(S1, C1list),
    string_codes(S2, C2list),
    findall(X, sublist(X, C1list), L),
    findall(X, sublist(X, C2list), M),
    intersection(L,M, Outl),
    longest(Outl, LongestL),
    string_codes(LongestS, LongestL),
    writeln(LongestS).

sublist(S, L) :-
  append(_, L2, L),
  append(S, _, L2).

longest([L], L) :-
   !.
longest([H|T], H) :-
   length(H, N),
   longest(T, X),
   length(X, M),
   N > M,
   !.
longest([H|T], X) :-
   longest(T, X),
   !.


?- myrun("abcdf", "bzcdf").
cdf
true.

这篇关于使用递归和DP的最长公共子串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！