问题描述
我只需要从NCBI(GenBank(完整)格式)下载完整的基因组序列.我对完整基因组"而不是整个基因组"着迷.
I have to download only complete genome sequences from NCBI (GenBank(full) format). I am intrested in 'complete geneome' not 'whole genome'.
我的脚本:
from Bio import Entrez
Entrez.email = "asiakXX@wp.pl"
gatunek='Escherichia[ORGN]'
handle = Entrez.esearch(db='nucleotide',
term=gatunek, property='complete genome' )#title='complete genome[title]')
result = Entrez.read(handle)
结果,我只得到了小的基因组片段,其大小约为484 bp:
As a results I get only small fragments of genomes, whith size about 484 bp:
LOCUS NZ_KE350773 484 bp DNA linear CON 23-AUG-2013
DEFINITION Escherichia coli E1777 genomic scaffold scaffold9_G, whole genome
shotgun sequence.
我知道如何通过NCBI网站手动进行操作,但是这非常耗时,我在那儿使用的查询是
I know how to do it manually via NCBI web site but it is very time consuming, the query that I use there:
escherichia[orgn] AND complete genome[title]
结果是我得到了多个基因组,大小约为5,154,862 bp,这是我需要通过ENTREZ.esearch进行的工作.
and as result I get multiple genomes with sizes range about 5,154,862 bp and this is what I need to do via ENTREZ.esearch.
推荐答案
您已经完成了最困难的部分,并完成了查询,
You've done the hard part and worked out the query,
escherichia[orgn] AND complete genome[title]
所以也可以通过Biopython将其用作搜索查询!
So use that as the search query via Biopython as well!
from Bio import Entrez
Entrez.email = "asiakXX@wp.pl"
search_term = "escherichia[orgn] AND complete genome[title]"
handle = Entrez.esearch(db='nucleotide', term=search_term)
result = Entrez.read(handle)
handle.close()
print(result['Count']) # added parenthesis
目前,从545778205(与网站相同)开始,我得到了140个结果: http://www.ncbi.nlm.nih.gov/nuccore/?term=escherichia%5Borgn%5D+AND+complete+genome%5Btitle%5D
Currently that gives me 140 results, starting with 545778205, which is the same as the website:http://www.ncbi.nlm.nih.gov/nuccore/?term=escherichia%5Borgn%5D+AND+complete+genome%5Btitle%5D
这篇关于如何在biopython entrez.esearch中下载完整的基因组序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!