Biopython成对对齐在循环中运行时会导致分段错误

本文介绍了Biopython成对对齐在循环中运行时会导致分段错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在biopython中循环运行成对全局对齐方法，以处理大约10000对字符串.每个字符串平均为20个字符长.对一对序列运行该方法效果很好.但是将其循环运行(低至4对)会导致分段错误.该如何解决?

I am trying to run pairwise global alignment method in biopython in loop for about 10000 pair of strings. Each string on an average is 20 characters long. Running the method for a single pair of sequences works fine. But running this in a loop, for as low as 4 pairs, results in segmentation fault. How can this be solved?

from Bio import pairwise2
def myTrial(source,targ):

     if source == targ:
         return [source,targ,source]

     alignments = pairwise2.align.globalmx(source, targ,1,-0.5)
     return alignments
sour = ['najprzytulniejszy', 'sadystyczny', 'wyrzucić', 'świat']
targ = ['najprzytulniejszym', 'sadystycznemu', 'wyrzucisz', 'świat']
for i in range(4):
   a = myTrial(sour[i],targ[i])

推荐答案

由于使用循环，所以没有发生分段错误，而是因为您提供非ASCII字符作为采用ASCII字符串的对齐方式的输入仅输入.幸运的是，Bio.pairwise2.align.globalmx还允许将包含任意ASCII和非ASCII字符字符串的对齐列表作为标记(即，将诸如['ABC', 'ABD']和['ABC', 'GGG']之类的字符串列表对齐以产生类似

The segmentation fault isn't happening because you are using a loop, but because you are providing non-ASCII characters as input for an alignment mode that takes ASCII string inputs only. Luckily, Bio.pairwise2.align.globalmx also permits aligning lists that contain arbitrary strings of ASCII and non-ASCII characters as tokens(i.e. aligning lists of strings, such as ['ABC', 'ABD'] with ['ABC', 'GGG'] to produce alignments like

['ABC', 'ABD', '-'  ]
['ABC', '-'  , 'GGG']

或者在您的情况下，将非ASCII字符(例如['ś', 'w', 'i', 'a', 't']和['w', 'y', 'r', 'z', 'u', 'c', 'i', 's', 'z'])的列表对齐以产生类似的对齐方式

or in your case, aligning lists of non-ASCII characters such as ['ś', 'w', 'i', 'a', 't'] and ['w', 'y', 'r', 'z', 'u', 'c', 'i', 's', 'z'] to produce alignments like

['ś', 'w', '-', '-', '-', '-', '-', 'i', 'a', 't', '-', '-']
['-', 'w', 'y', 'r', 'z', 'u', 'c', 'i', '-', '-', 's', 'z']

要使用Biopython完成此操作，请在您的代码中替换

To accomplish this with Biopython, in your code, replace

alignments = pairwise2.align.globalmx(source, targ,1,-0.5)

与

alignments = pairwise2.align.globalmx(list(source), list(targ), 1, -0.5, gap_char=['-'])

因此输入

source = 'świat'
targ = 'wyrzucisz'

修改后的代码将产生

[(['ś', 'w', '-', '-', '-', '-', '-', 'i', 'a', 't', '-', '-'],
  ['-', 'w', 'y', 'r', 'z', 'u', 'c', 'i', '-', '-', 's', 'z'],
  2.0,
  0,
  12)]

代替分段错误.

由于列表中的每个标记只有一个字符长，因此您还可以使用以下方法将对齐的列表转换回字符串:

And since each token in the list is only one character long, you can also convert the resulting aligned lists back into strings using:

new_alignment = []

for aln in alignment:
    # Convert lists back into strings
    a = ''.join(aln[0])
    b = ''.join(aln[1])

    new_aln = (a, b) + aln[2:]
    new_alignment.append(new_aln)

在上面的示例中，new_alignment将为

[('św-----iat--', '-wyrzuci--sz', 2.0, 0, 12)]

根据需要.

这篇关于Biopython成对对齐在循环中运行时会导致分段错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！