Given a file with aligned sequencing reads and a list of genomic features, a common task is to count how many reads map to each feature.
加入某个样有以下bam文件:
RCS48-1_CTTGTA_L003.sorted.rmp.bam
RCS48-2_CCGTCC_L005.sorted.rmp.bam
RCS48-2_CCGTCC_L007.sorted.rmp.bam
RCS48-3_GTCCGC_L006.sorted.rmp.bam
RCS48-3_GTCCGC_L007.sorted.rmp.bam
先将属于这个样的bam文件merge:
命令:
samtools merge RCS48.merged.bam RCS48-1_CTTGTA_L003.sorted.rmp.bam RCS48-2_CCGTCC_L005.sorted.rmp.bam RCS48-2_CCGTCC_L007.sorted.rmp.bam RCS48-3_GTCCGC_L006.sorted.rmp.bam RCS48-3_GTCCGC_L007.sorted.rmp.bam
merge后得到RCS48.merged.bam文件,
对合并好的bam文件重新进行sort, 按照read name 而不是coordinate,
命令:
samtools sort -n RCS48.merged.bam RCS48.merged.namesorted
sort之后得到RCS48.merged.namesorted.bam
如果你是老版本的htseq-count, 要将上面的bam文件转化为sam文件:
命令:
samtools view -h RCS48.merged.namesorted.bam > RCS48.merged.namesorted.sam
最后执行htseq-count
命令:
htseq-count -s no RCS48.merged.namesorted.sam ../../refseq/Osativa_204.gtf > RCS48_count
由于我的bam经过了remove duplicate步骤,所以会出现以下warning:
Warning: Read HWI-D00258:40:C2D73ACXX:2:2316:20611:80446 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
33638504 sam line pairs processed.
更详细阅读:
http://www-huber.embl.de/users/anders/HTSeq/doc/count.html
by freemao
FAFU.