如何在biopython entrez.esearch中下载完整的基因组序列

本文介绍了如何在biopython entrez.esearch中下载完整的基因组序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我只需要从NCBI(GenBank(完整)格式)下载完整的基因组序列.我对完整基因组"而不是整个基因组"着迷.

I have to download only complete genome sequences from NCBI (GenBank(full) format). I am intrested in 'complete geneome' not 'whole genome'.

我的脚本:

from Bio import Entrez
Entrez.email = "asiakXX@wp.pl"
gatunek='Escherichia[ORGN]'
handle = Entrez.esearch(db='nucleotide',
     term=gatunek, property='complete genome' )#title='complete genome[title]')
result = Entrez.read(handle)

结果，我只得到了小的基因组片段，其大小约为484 bp:

As a results I get only small fragments of genomes, whith size about 484 bp:

LOCUS       NZ_KE350773              484 bp    DNA     linear   CON 23-AUG-2013
DEFINITION  Escherichia coli E1777 genomic scaffold scaffold9_G, whole genome
       shotgun sequence.

我知道如何通过NCBI网站手动进行操作，但是这非常耗时，我在那儿使用的查询是

I know how to do it manually via NCBI web site but it is very time consuming, the query that I use there:

escherichia[orgn] AND complete genome[title]

结果是我得到了多个基因组，大小约为5,154,862 bp，这是我需要通过ENTREZ.esearch进行的工作.

and as result I get multiple genomes with sizes range about 5,154,862 bp and this is what I need to do via ENTREZ.esearch.

推荐答案

您已经完成了最困难的部分，并完成了查询，

You've done the hard part and worked out the query,

escherichia[orgn] AND complete genome[title]

所以也可以通过Biopython将其用作搜索查询！

So use that as the search query via Biopython as well!

from Bio import Entrez
Entrez.email = "asiakXX@wp.pl"
search_term = "escherichia[orgn] AND complete genome[title]"
handle = Entrez.esearch(db='nucleotide', term=search_term)
result = Entrez.read(handle)
handle.close()
print(result['Count']) # added parenthesis

目前，从545778205(与网站相同)开始，我得到了140个结果: http://www.ncbi.nlm.nih.gov/nuccore/?term=escherichia%5Borgn%5D+AND+complete+genome%5Btitle%5D

Currently that gives me 140 results, starting with 545778205, which is the same as the website:http://www.ncbi.nlm.nih.gov/nuccore/?term=escherichia%5Borgn%5D+AND+complete+genome%5Btitle%5D

这篇关于如何在biopython entrez.esearch中下载完整的基因组序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！