Given a file with aligned sequencing reads and a list of genomic features, a common task is to count how many reads map to each feature.

加入某个样有以下bam文件:

RCS48-1_CTTGTA_L003.sorted.rmp.bam

RCS48-2_CCGTCC_L005.sorted.rmp.bam

RCS48-2_CCGTCC_L007.sorted.rmp.bam

RCS48-3_GTCCGC_L006.sorted.rmp.bam

RCS48-3_GTCCGC_L007.sorted.rmp.bam

先将属于这个样的bam文件merge:

命令:

samtools merge RCS48.merged.bam RCS48-1_CTTGTA_L003.sorted.rmp.bam RCS48-2_CCGTCC_L005.sorted.rmp.bam RCS48-2_CCGTCC_L007.sorted.rmp.bam RCS48-3_GTCCGC_L006.sorted.rmp.bam RCS48-3_GTCCGC_L007.sorted.rmp.bam

merge后得到RCS48.merged.bam文件,

对合并好的bam文件重新进行sort, 按照read name 而不是coordinate,

命令:

samtools sort -n RCS48.merged.bam RCS48.merged.namesorted

sort之后得到RCS48.merged.namesorted.bam

如果你是老版本的htseq-count, 要将上面的bam文件转化为sam文件:

命令:

samtools view -h RCS48.merged.namesorted.bam > RCS48.merged.namesorted.sam

最后执行htseq-count

命令:

htseq-count -s no RCS48.merged.namesorted.sam ../../refseq/Osativa_204.gtf > RCS48_count

由于我的bam经过了remove duplicate步骤,所以会出现以下warning:

Warning: Read HWI-D00258:40:C2D73ACXX:2:2316:20611:80446 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
33638504 sam line pairs processed.

更详细阅读:

http://www-huber.embl.de/users/anders/HTSeq/doc/count.html

by freemao

FAFU.

[email protected]

05-02 05:28