问题描述
我正在尝试生成长度可变的N和C终端切片(1,2,3,4,5,6,7).但是在到达那里之前,我只是在读取fasta文件时遇到问题.我遵循的是随机子序列"的主教程,网址为: https://biopython.org/wiki/SeqIO.但是在这种情况下,只有一个序列,所以也许那是我出错的地方.带有示例序列和我的错误的代码.任何帮助将非常感激.我显然超出了我的深度.看来其他人也遇到过很多类似的问题,所以我想我做这件事很愚蠢,因为我不完全了解SeqRecord结构.谢谢!
I am trying to generate varying length N and C termini Slices (1,2,3,4,5,6,7). But before I get there I am having problems just reading in my fasta files. I was following the 'Random subsequences' head tutorial from:https://biopython.org/wiki/SeqIO . But in this case there is only one sequence so maybe that is where I went wrong. The code with example sequences and my errors. Any help would be much appreciated. I am clearly out of my depth. It looks like there are a lot of similar problems others have come across so I imagine it is something stupid that I am doing because I do not fully understand the SeqRecord structures. Thanks!
我的文件domains.fasta中的两个示例序列:
Two example sequences in my file domains.fasta:
>GA98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE
>GB98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE
我的代码不起作用:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
# Load data:
domains = list(SeqIO.parse("domains.fa",'fasta'))
#set up receiving arrays
home=[]
num=1
#slice data
for i in range(0, 6):
num = num+1
domain = domains
seq_n = domains.seq[0:num]
seq_c = domains.seq[len(domain)-num:len(domain)]
name = domains.id
record_d = SeqRecord(domain,'%s' % (name), '', '')
home.append(record_d)
record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
home.append(record_n)
record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")
我得到的错误是:
Traceback (most recent call last):
File "~/fasta_nc_sequences.py", line 20, in <module>
seq_n = domains.seq[0:num]
AttributeError: 'list' object has no attribute 'SeqRecord'
当我打印出'domains = list(SeqIO.parse("domains.fa",'fasta'))'时,我得到以下信息:
When I print out 'domains = list(SeqIO.parse("domains.fa",'fasta'))' I get this:
[SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE', SingleLetterAlphabet()), id='GA98', name='GA98', description='GA98', dbxrefs=[]), SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE', SingleLetterAlphabet()), id='GB98', name='GB98', description='GB98', dbxrefs=[])]
我不确定为什么我无法访问SeqRecord中的内容.也许是因为我将SeqIO.parse包裹在一个列表中是因为在我被抛出另一个错误之前:
I am not sure why I cannot access what is within the SeqRecord. Maybe it is because I wrapped the SeqIO.parse in a list because before I was being thrown a different error:
AttributeError: 'generator' object has no attribute 'seq'
推荐答案
我在for循环中工作的级别太低,因此我没有遍历序列.访问C端序列也存在问题.现在代码可以正常工作了.
I was working one level too low in my for loop so I was not iterating through the sequences. There were also problems accessing the C terminus sequence. Now the code works.
#Load data:
domains = list(SeqIO.parse("examples/data/domains.fa",'fasta'))
#set up receiving arrays
home=[]
#num=1
#subset data
for record in (domains):
num = 0
domain = record.seq
name = record.id
record_d = SeqRecord(domain,'%s' % (name), '', '')
home.append(record_d)
for i in range(0, 6):
num= num+1
seq_n = record.seq[0:num]
seq_c = record.seq[len(record.seq)-num:len(record.seq)]
record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
home.append(record_n)
record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")
这篇关于AttributeError:“列表"对象没有属性"SeqRecord"-在尝试使用来自Fasta文件的Biopython> SeqIO切片多个序列时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!