我正在尝试实现Waterman-Eggert算法以查找次优的局部序列比对,但仍在努力了解如何“分簇”各个比对组。我的基本Smith-Waterman算法工作正常。
一个简单的测试,将以下顺序与自身对齐:
'HEAGHEAGHEAG'
'HEAGHEAGHEAG'
产生一个fMatrix,如下所示:
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 8. 0. 0. 0. 8. 0. 0. 0. 8. 0. 0. 0.]
[ 0. 0. 13. 0. 0. 0. 13. 0. 0. 0. 13. 0. 0.]
[ 0. 0. 0. 17. 0. 0. 0. 17. 0. 0. 0. 17. 0.]
[ 0. 0. 0. 0. 23. 0. 0. 0. 23. 0. 0. 0. 23.]
[ 0. 8. 0. 0. 0. 31. 0. 0. 0. 31. 0. 0. 0.]
[ 0. 0. 13. 0. 0. 0. 36. 0. 0. 0. 36. 0. 0.]
[ 0. 0. 0. 17. 0. 0. 0. 40. 0. 0. 0. 40. 0.]
[ 0. 0. 0. 0. 23. 0. 0. 0. 46. 0. 0. 0. 46.]
[ 0. 8. 0. 0. 0. 31. 0. 0. 0. 54. 4. 0. 0.]
[ 0. 0. 13. 0. 0. 0. 36. 0. 0. 4. 59. 9. 0.]
[ 0. 0. 0. 17. 0. 0. 0. 40. 0. 0. 9. 63. 13.]
[ 0. 0. 0. 0. 23. 0. 0. 0. 46. 0. 0. 13. 69.]]
为了找到次优的比对,例如
'HEAGHEAGHEAG '
' HEAGHEAGHEAG'
您必须先删除最佳对齐方式(即沿主对角线),然后重新计算fMatrix;这被称为“去聚集”,其中比对的“簇”被定义为其路径相交/共享一对或多对比对的残基的任何比对。除fMatrix之外,还有一个辅助矩阵,其中包含有关fMatrix构造方向的信息。
用于构建fMatrix和回溯矩阵的代码片段如下:
# Generates fMatrix.
for i in range(1, length):
for j in range(1, length):
matchScore = fMatrix[i-1][j-1] + simMatrixDict[seq[i-1]+seq[j-1]]
insScore = fMatrix[i][j-1] + gap
delScore = fMatrix[i-1][j] + gap
fMatrix[i][j] = max(0, matchScore, insScore, delScore)
# Generates matrix for backtracking.
if fMatrix[i][j] == matchScore:
backMatrix[i][j] = 2
elif fMatrix[i][j] == insScore:
backMatrix[i][j] = 3 # INSERTION in seq - Horizontal
elif fMatrix[i][j] == delScore:
backMatrix[i][j] = 1 # DELETION in seq - Vertical
if fMatrix[i][j] >= backtrackStart:
backtrackStart = fMatrix[i][j]
endCoords = i, j
return fMatrix, backMatrix, endCoords
为了删除此最佳对齐方式,我尝试使用此backMatrix来回溯fMatrix(按照原始的Smith-Waterman算法)并按我的方式设置
fMatrix[i][j] = 0
,但这不会删除整个团,仅删除其中的确切对齐方式丛。有关一些背景信息,Smith-Waterman算法的Wikipedia页面介绍了fMatrix的构造方式,并且在here上还提供了有关回溯工作原理的说明。 Waterman-Eggert算法大致解释为here。
谢谢。
最佳答案
好的。这是一些代码,可以执行您想要的操作。我使用了 pretty-print 库(pprint
),以便输出看起来不错。 (如果矩阵中的数字是一位数字,则看起来最好,但是如果有多位数字,则对齐会有些困惑。)
它是如何工作的?
因为您只需要更改主对角线以及上面和下面的对角线上的数字,所以我们只需要一个for循环。 matrix[i][i]
始终位于主对角线上,因为它是i
行向下,而i
列则位于其对角线。 matrix[i][i-1]
始终是相邻的下对角线,因为它向下是i
行,而在整个i-1
列上。 matrix[i-1][i]
始终是相邻的上对角线,因为它向下是i-1
行,而在两端是i
行。
#!/usr/bin/python
import pprint
matrix = [
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,],
[ 0, 8, 0, 0, 0, 8, 0, 0, 0, 8, 0, 0, 0,],
[ 0, 0, 13, 0, 0, 0, 13, 0, 0, 0, 13, 0, 0,],
[ 0, 0, 0, 17, 0, 0, 0, 17, 0, 0, 0, 17, 0,],
[ 0, 0, 0, 0, 23, 0, 0, 0, 23, 0, 0, 0, 23,],
[ 0, 8, 0, 0, 0, 31, 0, 0, 0, 31, 0, 0, 0,],
[ 0, 0, 13, 0, 0, 0, 36, 0, 0, 0, 36, 0, 0,],
[ 0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 0, 40, 0,],
[ 0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 0, 46,],
[ 0, 8, 0, 0, 0, 31, 0, 0, 0, 54, 4, 0, 0,],
[ 0, 0, 13, 0, 0, 0, 36, 0, 0, 4, 59, 9, 0,],
[ 0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 9, 63, 13,],
[ 0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 13, 69,]]
print "Original Matrix"
pprint.pprint(matrix)
print
for i in range(len(matrix)):
matrix[i][i] = 0
if (i > 0) and (i < (len(matrix))):
matrix[i][i-1] = 0
matrix[i-1][i] = 0
print "New Matrix"
pprint.pprint(matrix)
输出:
Original Matrix
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 8, 0, 0, 0, 8, 0, 0, 0, 8, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 13, 0, 0, 0, 13, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 17, 0, 0, 0, 17, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 23, 0, 0, 0, 23],
[0, 8, 0, 0, 0, 31, 0, 0, 0, 31, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 36, 0, 0, 0, 36, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 0, 40, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 0, 46],
[0, 8, 0, 0, 0, 31, 0, 0, 0, 54, 4, 0, 0],
[0, 0, 13, 0, 0, 0, 36, 0, 0, 4, 59, 9, 0],
[0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 9, 63, 13],
[0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 13, 69]]
New Matrix
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 8, 0, 0, 0, 8, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 13, 0, 0, 0, 13, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 17, 0, 0, 0, 17, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 23],
[0, 8, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 0, 0, 0, 0, 40, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 46],
[0, 8, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 36, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 0, 0]]
关于python - 实现沃特曼-埃格特算法,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/16611706/