我写了一些java代码来实现smith-waterman算法。我有一篇日记,上面写着

sim  (X, Y)  = 2 . SLength  (X, Y) / Length  ( X)  +  Length  (Y  )


X  and  Y are  the  sequences  for  comparison.
SLength(X, Y) is  the  string length of the  maximum matching set.
Length(X) is the  number  of  characters  in  the  sequence X.
Length(Y)  is  the     number  of  characters  in  the  sequence  Y.
The  result  of  this function (sim)  is a real  number,  O<=sim<=1.
The larger  SIM is, the  stronger  the  two  programs  similarity  is,
and  the  plagiarism possibility larger,  vice versa.

这是我的史密斯·沃特曼密码
public class SmithWaterman {

private final String one, two;
private final int matrix[][];
private int gap;
private final int match;
private final int o;
private int l;
private final int e;

public SmithWaterman(String one, String two) {
    this.one = "-" + one.toLowerCase();
    this.two = "-" + two.toLowerCase();
    this.match = 2;

    // Define affine gap starting values
    o = -2;
    l = -1;
    e = -1;

    // initialize matrix to 0
    matrix = new int[one.length() + 1][two.length() + 1];
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            matrix[i][j] = 0;
        }
    }

}

// returns the alignment score
/**
 * @return
 */
public double computeSmithWaterman() {
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            gap = o + (l - 1) * e;
            if (i != 0 && j != 0) {
                if (one.charAt(i) == two.charAt(j)) {
                    // match
                    // reset l
                    l = 0;
                    matrix[i][j] = Math.max(0, Math.max(
                            matrix[i - 1][j - 1] + match, Math.max(
                                    matrix[i - 1][j] + gap,
                                    matrix[i][j - 1] + gap)));
                } else {
                    // gap
                    l++;
                    matrix[i][j] = Math.max(0, Math.max(
                            matrix[i - 1][j - 1] + gap, Math.max(
                                    matrix[i - 1][j] + gap,
                                    matrix[i][j - 1] + gap)));
                }
            }
        }
    }

    // find the highest value
    double longest = 0;
    int iL = 0, jL = 0;
    for (int i = 0; i < one.length(); i++) {
        for (int j = 0; j < two.length(); j++) {
            if (matrix[i][j] > longest) {
                longest = matrix[i][j];
                iL = i;
                jL = j;
            }
        }
    }

    // Backtrack to reconstruct the path
    int i = iL;
    int j = jL;
    Stack<String> actions = new Stack<String>();

    while (i != 0 && j != 0) {
        // diag case
        if (Math.max(matrix[i - 1][j - 1],
                Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i - 1][j - 1]) {
            actions.push("align");
            i = i - 1;
            j = j - 1;
            // left case
        } else if (Math.max(matrix[i - 1][j - 1],
                Math.max(matrix[i - 1][j], matrix[i][j - 1])) == matrix[i][j - 1]) {
            actions.push("insert");
            j = j - 1;
            // up case
        } else {
            actions.push("delete");
            i = i - 1;
        }
    }

    int maxMatchSet = actions.size();

    String alignOne = new String();
    String alignTwo = new String();

    Stack<String> backActions = (Stack<String>) actions.clone();
    for (int z = 0; z < one.length(); z++) {
        alignOne = alignOne + one.charAt(z);
        if (!actions.empty()) {
            String curAction = actions.pop();

            if (curAction.equals("insert")) {
                alignOne = alignOne + "-";
                while (actions.peek().equals("insert")) {
                    alignOne = alignOne + "-";
                    actions.pop();
                }
            }
        }
    }

    for (int z = 0; z < two.length(); z++) {
        alignTwo = alignTwo + two.charAt(z);
        if (!backActions.empty()) {
            String curAction = backActions.pop();
            if (curAction.equals("delete")) {
                alignTwo = alignTwo + "-";
                while (backActions.peek().equals("delete")) {
                    alignTwo = alignTwo + "-";
                    backActions.pop();
                }
            }
        }
    }
    int minMatchSet = backActions.size();

    // print alignment
    double realLengthStringOne = one.length() - 1;
    double realLenghtStringTwo = two.length() - 1;
    double totalOfMatricesElement = realLengthStringOne + realLenghtStringTwo;

    double value = (2 * maxMatchSet / totalOfMatricesElement) * 100;

    System.out.println("2 * " + maxMatchSet + " / " + "( " + realLengthStringOne + " + " + realLenghtStringTwo + " ) " + "= " + value + "%");


    return value;
}

public void printMatrix() {

    for (int i = 0; i < one.length(); i++) {
        if (i == 0) {
            for (int z = 0; z < two.length(); z++) {
                if (z == 0) {
                    System.out.print("  \t");
                }
                System.out.print(two.charAt(z) + " \t");

                if (z == two.length() - 1) {
                    System.out.println();
                }
            }
        }

        for (int j = 0; j < two.length(); j++) {
            if (j == 0) {
                System.out.print(one.charAt(i) + " \t");
            }
            System.out.print(matrix[i][j] + " \t");
        }
        System.out.println();
    }
    System.out.println();
}

public static void main(String[] args) {
    // DNA sequence Test:

    SmithWaterman sw = new SmithWaterman("ahmad", "achmad");
    System.out.println("Alignment Score: " + sw.computeSmithWaterman());

    sw.printMatrix();

}
}

如果我有两个序列,如“ahmad”,“ahmad”,则输出=100%,
但是你知道,如果我有两个序列,像“ahmad”,“achmad”,输出如下:
2 * 6 / ( 5.0 + 6.0 ) = 109.09090909090908%

Alignment Score: 109.09090909090908
  -     a   c   h   m   a   d
-   0   0   0   0   0   0   0
a   0   2   1   0   0   2   1
h   0   0   0   3   2   0   0
m   0   0   0   0   5   4   2
a   0   2   1   0   2   7   6
d   0   0   0   0   0   1   9

我有没有在实现中工作,我在哪里丢失了代码?

最佳答案

回答你的直接问题为什么你得到150%
你的变量longest = 6和你的设置值onetwo实际上分别是'-' + one'-'+ two,所以数学是2 * 6 / 8 = 12 /8 = 1.5*100=150%。
如果使用onetwo的原始长度,您可能会得到正确的答案。
不过,我认为你的方法可能有缺陷:
变量longest不是匹配的长度,而是矩阵中的最高得分这是史密斯·沃特曼校准分数。目前这是可行的,因为您正在对齐一个完美匹配,并使用+2的匹配分数,但是我不确定这是否适用于非完美匹配。
该值表示通过矩阵的最佳评分(部分)对齐路径。虽然这条路通常是最长的,但它不一定是。其他地方可能会有更长但更糟糕的得分路径。
此外,你的比赛惩罚开放差距-2和延伸-1意味着多个连续差距将使你的比赛分数不再是偶数。
要实际查看对齐的时间,您必须从您的最高得分位置开始跟踪矩阵,直到到达矩阵中得分0的单元格(因为这是Smith Waterman,它允许与全长全局路线相对的局部路线)。
您已经在actions代码块中执行了类似的操作。但是,您可能需要考虑如何将插入和删除视为长度的一部分。如果你想计算它们,那么最长的对齐长度就是actions.size()

09-26 01:10