问题描述
我刚刚开始使用perl,我有一个问题.我有PHYLIP文件,我需要将其转换为FASTA.我开始写脚本.首先,我删除了行中的scpaces,现在我需要对齐所有行,每行应包含60个氨基酸,序列标识符应打印在新行中.也许有人可以给我一些建议?
I just start working with perl and I have a question. I have PHYLIP file and I need convert it into FASTA. I start writing a script. Firstly, i removed scpaces in lines, now i need to align all lines that in every line should be 60 aminoacids and sequances identificator should be printed in new line. Maybe someone could give me some advice?
推荐答案
BioPerl Bio :: AlignIO 模块可能会有所帮助.它支持 PHYLIP 序列格式:
BioPerl Bio::AlignIO module might help. It support the PHYLIP sequence format :
phylip2fasta.pl
use strict;
use warnings;
use Bio::AlignIO;
# http://doc.bioperl.org/bioperl-live/Bio/AlignIO.html
# http://doc.bioperl.org/bioperl-live/Bio/AlignIO/phylip.html
# http://www.bioperl.org/wiki/PHYLIP_multiple_alignment_format
my ($inputfilename) = @ARGV;
die "must provide phylip file as 1st parameter...\n" unless $inputfilename;
my $in = Bio::AlignIO->new(-file => $inputfilename ,
-format => 'phylip',
-interleaved => 1);
my $out = Bio::AlignIO->new(-fh => \*STDOUT ,
-format => 'fasta');
while ( my $aln = $in->next_aln() ) {
$out->write_aln($aln);
}
$ perl phylip2fasta.pl test.phylip
>Turkey/1-42
AAGCTNGGGCATTTCAGGGTGAGCCCGGGCAATACAGGGTAT
>Salmo_gair/1-42
AAGCCTTGGCAGTGCAGGGTGAGCCGTGGCCGGGCACGGTAT
>H._Sapiens/1-42
ACCGGTTGGCCGTTCAGGGTACAGGTTGGCCGTTCAGGGTAA
>Chimp/1-42
AAACCCTTGCCGTTACGCTTAAACCGAGGCCGGGACACTCAT
>Gorilla/1-42
AAACCCTTGCCGGTACGCTTAAACCATTGCCGGTACGCTTAA
test.phylip http://evolution. Genetics.washington.edu/phylip/doc/sequence.html
5 42
Turkey AAGCTNGGGC ATTTCAGGGT
Salmo gairAAGCCTTGGC AGTGCAGGGT
H. SapiensACCGGTTGGC CGTTCAGGGT
Chimp AAACCCTTGC CGTTACGCTT
Gorilla AAACCCTTGC CGGTACGCTT
GAGCCCGGGC AATACAGGGT AT
GAGCCGTGGC CGGGCACGGT AT
ACAGGTTGGC CGTTCAGGGT AA
AAACCGAGGC CGGGACACTC AT
AAACCATTGC CGGTACGCTT AA
这篇关于如何将PHYLIP格式转换为FASTA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!