从FASTA文件打印序列

本文介绍了从FASTA文件打印序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我经常需要找到一个FASTA文件中的特定序列，并打印出来。对于那些不知道是谁，FASTA是生物序列（DNA，蛋白质等）的文本文件格式。这是pretty简单，你有一个'>'pceded序列名称$ P $行，然后按照直到下一个'>'是序列本身的所有行。例如：

I often need to find a particular sequence in a fasta file and print it. For those who don't know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the sequence name preceded by a '>' and then all the lines following until the next '>' are the sequence itself. For example:

>sequence1
ACTGACTGACTGACTG
>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG
>sequence3
ACTGACTGACTGACTG

目前我得到我所需要的序列的方法是使用grep有-A，所以我会做

The way I'm currently getting the sequence I need is to use grep with -A, so I'll do

grep -A 10 sequence_name filename.fa

，然后如果我没有看到文件中的下一个序列的开始，我会改的10到20个，并重复，直到我敢肯定，我得到了整个序列。

and then if I don't see the start of the next sequence in the file, I'll change the 10 to 20 and repeat until I'm sure I'm getting the whole sequence.

好像应该有更好的方式来做到这一点。例如，我可以问它打印，直到下一个'>'字符？

It seems like there should be a better way to do this. For example, can I ask it to print up until the next '>' character?

the

从FASTA文件打印序列

问题描述

推荐答案