详细的使用说明:http://bedtools.readthedocs.org/en/latest/
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
Summary of available tools.
bedtools support a wide range of operations for interrogating and manipulating genomic features. The table below summarizes the tools available in the suite.
annotate | Annotate coverage of features from multiple files. |
bamtobed | Convert BAM alignments to BED (& other) formats. |
bamtofastq | Convert BAM records to FASTQ records. |
bed12tobed6 | Breaks BED12 intervals into discrete BED6 intervals. |
bedpetobam | Convert BEDPE intervals to BAM records. |
bedtobam | Convert intervals to BAM records. |
closest | Find the closest, potentially non-overlapping interval. |
cluster | Cluster (but don’t merge) overlapping/nearby intervals. |
complement | Extract intervals _not_ represented by an interval file. |
coverage | Compute the coverage over defined intervals. |
expand | Replicate lines based on lists of values in columns. |
flank | Create new intervals from the flanks of existing intervals. |
genomecov | Compute the coverage over an entire genome. |
getfasta | Use intervals to extract sequences from a FASTA file. |
groupby | Group by common cols. & summarize oth. cols. (~ SQL “groupBy”) |
igv | Create an IGV snapshot batch script. |
intersect | Find overlapping intervals in various ways. |
jaccard | Calculate the Jaccard statistic b/w two sets of intervals. |
links | Create a HTML page of links to UCSC locations. |
makewindows | Make interval “windows” across a genome. |
map | Apply a function to a column for each overlapping interval. |
maskfasta | Use intervals to mask sequences from a FASTA file. |
merge | Combine overlapping/nearby intervals into a single interval. |
multicov | Counts coverage from multiple BAMs at specific intervals. |
multiinter | Identifies common intervals among multiple interval files. |
nuc | Profile the nucleotide content of intervals in a FASTA file. |
overlap | Computes the amount of overlap from two intervals. |
pairtobed | Find pairs that overlap intervals in various ways. |
pairtopair | Find pairs that overlap other pairs in various ways. |
random | Generate random intervals in a genome. |
reldist | Calculate the distribution of relative distances b/w two files. |
shuffle | Randomly redistribute intervals in a genome. |
slop | Adjust the size of intervals. |
sort | Order the intervals in a file. |
subtract | Remove intervals based on overlaps b/w two files. |
tag | Tag BAM alignments based on overlaps with interval files. |
unionbedg | Combines coverage intervals from multiple BEDGRAPH files. |
window | Find overlapping intervals within a window around an interval. |
安装: yum install BEDTools
1, 将bam文件(tophat得到的结果)转化为fastq
先将比对得到的accepted_hits.bam和unmapped.bam合并
samtools merge RC6-1_ATTCCT_L005.bam accepted_hits.bam unmapped.bam
得到合并后的RC6-1_ATTCCT_L005.bam文件
将该bam文件按照reads名称排序:
samtools_0.1.18 sort -n RC6-1_ATTCCT_L005.bam RC6-1_ATTCCT_L005.sorted
得到RC6-1_ATTCCT_L005.sorted.bam文件
最后用bedtools转化
bedtools bamtofastq -i RC6-1_ATTCCT_L005.sorted.bam -fq RC6-1_ATTCCT_L005_R1.fastq -fq2 RC6-1_ATTCCT_L005_R2.fastq
得到双端的fastq文件。