问题描述
予有存储在列表反式一组n个(〜1000000)字符串(DNA序列)。我一定要找到列表中的所有序列的最小汉明距离。我实现了一个幼稚的蛮力算法,它已经运行了超过一天,还没有给出解决办法。我的code是
DMIN = LEN(反[0])
对我的xrange(LEN(反)):
对于j中的xrange第(i + 1,LEN(反式)):
DIST = hamdist(反式[I] [: - 1],反式[J] [: - 1])
如果DIST< DMIN:
DMIN = DIST
有没有更有效的方法来做到这一点?这里hamdist是一个函数我写信给找汉明距离。这是
高清hamdist(STR1,STR2):
diff文件= 0
如果len(STR1)= LEN(STR2)!
返回最大(LEN(STR1),LEN(STR2))
对于CH1,CH2拉链(STR1,STR2):
如果CH1 = CH2!
差异列表+ = 1
返回的diff
您可以通过添加包含你走到这一步的最小距离的可选参数优化 hamdist
功能,这样,如果的diff
达到该值停止计算距离,因为这种比较会给你比最小一个更大的距离:
高清hamdist(STR1,STR2,prevMin =无):
diff文件= 0
如果len(STR1)= LEN(STR2)!
返回最大(LEN(STR1),LEN(STR2))
对于CH1,CH2拉链(STR1,STR2):
如果CH1 = CH2!
差异列表+ = 1
如果prevMin不无和diff文件> prevMin:
返回None
返回的diff
您需要调整您的主回路与 hamdist
无
返回值来工作:
DMIN = LEN(反[0])
对我的xrange(LEN(反)):
对于j中的xrange第(i + 1,LEN(反式)):
DIST = hamdist(反式[I] [: - 1],反式[J] [: - 1])
如果DIST不无和DIST< DMIN:
DMIN = DIST
I have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum hamming distance of all sequences in the list. I implemented a naive brute force algorithm, which has been running for more than a day and has not yet given a solution. My code is
dmin=len(trans[0])
for i in xrange(len(trans)):
for j in xrange(i+1,len(trans)):
dist=hamdist(trans[i][:-1], trans[j][:-1])
if dist < dmin:
dmin = dist
Is there a more efficient method to do this? Here hamdist is a function I wrote to find hamming distances. It is
def hamdist(str1, str2):
diffs = 0
if len(str1) != len(str2):
return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
if ch1 != ch2:
diffs += 1
return diffs
You could optimize your hamdist
function by adding an optional parameter containing the minimum distance you have got so far, this way if diffs
reaches that value you stop calculating the distance because this comparison will give you a greater distance than the minimum:
def hamdist(str1, str2,prevMin=None):
diffs = 0
if len(str1) != len(str2):
return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
if ch1 != ch2:
diffs += 1
if prevMin is not None and diffs>prevMin:
return None
return diffs
You will need to adapt your main loop to work with None
return value from hamdist
:
dmin=len(trans[0])
for i in xrange(len(trans)):
for j in xrange(i+1,len(trans)):
dist=hamdist(trans[i][:-1], trans[j][:-1])
if dist is not None and dist < dmin:
dmin = dist
这篇关于找出一组在Python的最小汉明距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!