问题描述
我正在尝试使用仿射间隙罚分函数实现Smith-Waterman算法以进行局部序列比对.我想我了解如何初始化和计算计算比对分数所需的矩阵,但是对于如何追溯然后找到比对却一无所知.要生成所需的3个矩阵,我需要以下代码
I'm trying to implement the Smith-Waterman algorithm for local sequence alignment using the affine gap penalty function. I think I understand how to initiate and compute the matrices required for calculating alignment scores, but am clueless as to how to then traceback to find the alignment. To generate the 3 matrices required I have the following code
for j in range(1, len2):
for i in range(1, len1):
fxOpen = F[i][j-1] + gap
xExtend = Ix[i][j-1] + extend
Ix[i][j] = max(fxOpen, xExtend)
fyOpen = F[i-1][j] + gap
yExtend = Iy[i-1][j] + extend
Iy[i][j] = max(fyOpen, yExtend)
matchScore = (F[i-1][j-1] + simMatrixDict[seq1[i-1]+seq2[j-1]])
xScore = Ix[i-1][j-1] + simMatrixDict[seq1[i-1]+seq2[j-1]]
yScore = Iy[i-1][j-1] + simMatrixDict[seq1[i-1]+seq2[j-1]]
F[i][j] = max(0, matchScore, xScore, yScore)
我不确定是否需要单个矩阵进行追溯,还是仅需1个?任何有关如何从F中的最大分数追溯的澄清都将不胜感激.
I am unsure if I need a single matrix for traceback, or just 1? Any clarification on how to go about tracing back from the max score in F would be much appreciated.
推荐答案
关于Smith-Waterman中的回溯,要记住的重要一点是,值所在的矩阵确定了移动的方向.因此,如果您在F
中,您将沿对角线移动;如果您在Ix
中,则将水平移动;如果您在Iy
中,则将垂直移动.这意味着您需要存储在指针矩阵中的就是从其到达一个正方形的矩阵.您要使用的矩阵(而不是要使用的矩阵)确定要去的方向.
The important thing to remember about traceback in Smith-Waterman is that the matrix a value is in determines the direction that you move. So, if you are in F
you're moving diagonally, if you're in Ix
, you're moving horizontally, and if you're in Iy
, you're moving vertically. This means that all you need to store in the pointer matrix is the matrix that you arrived at a square from. The matrix that you are coming from, not the one that you are going to, determines the dirction which to go.
例如:
说您在F[5][5]
:
- 如果指针矩阵说要转到
Ix
,请转到Ix[4][4]
- 如果指针矩阵说要转到
Iy
,请转到Iy[4][4]
- 如果指针矩阵说要转到
F
,请转到F[4][4]
- If pointer matrix says to go to
Ix
, go toIx[4][4]
- If pointer matrix says to go to
Iy
, go toIy[4][4]
- If pointer matrix says to go to
F
, go toF[4][4]
如果您在Ix[5][5]
:
- 如果指针矩阵说要转到
Ix
,请转到Ix[4][5]
- 如果指针矩阵说要转到
F
,请转到F[4][5]
- If pointer matrix says to go to
Ix
, go toIx[4][5]
- If pointer matrix says to go to
F
, go toF[4][5]
或者如果您在Iy[5][5]
:
- 如果指针矩阵说要转到
Iy
,请转到Iy[5][4]
- 如果指针矩阵说要转到
F
,请转到F[5][4]
- If pointer matrix says to go to
Iy
, go toIy[5][4]
- If pointer matrix says to go to
F
, go toF[5][4]
假定第一个索引是x坐标,第二个索引是y坐标.
Assuming that the first index is the x coordinate and the second is the y coordinate.
继续追溯,直到到达最大值为0的单元格.
Continue tracing back until you reach a cell with a maximum value of 0.
构建指针矩阵:对于F
,Ix
和Iy
,每个都需要一个指针矩阵.这些矩阵仅需要指示值来自哪个矩阵,因为它告诉您前进的方向.因此,当您在算法的动态编程阶段中运行时,您还应该构建指针矩阵.每次在F
,Ix
或Iy
的单元格中存储新的最大值时,都应更新相应的矩阵以指示其来源.例如,如果在F[4][4]
中可以通过对齐两个下两个碱基来获得F[5][5]
中的最大值,则Fpointer [5] [5]应该设置为F
,因为从F
矩阵开始.
Building the pointer matrix:You need one pointer matrix each for F
, Ix
, and Iy
. These matrices only need to indicate which matrix a value came from, because that tells you which direction you were moving in. So, as you're running through the dynamic programming phase of the algorithm, you should also be building up the pointer matrices. Every time you store a new maximum value in a cell in F
, Ix
, or Iy
, you should update the corresponding matrix to indicate where it came from. If, for instance, the highest value you can have in F[5][5]
comes by aligning the two next bases when you're in F[4][4]
, the Fpointer[5][5] should be set to F
, because you got there from the F
matrix.
这篇关于具有仿射间隙罚分的Smith-Wateman算法中的追溯的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!