gap填补工具

事实上,目前有很多工具能够直接使用测序reads,填补序列中N碱基空缺。

因此对于gap区域,我们可以首先拿N碱基连起来(具体多少数量的N,可以参照参考基因组来确定),然后再设法补洞。

例如SOAPdenovo套件中的GapCloser等,可以试一下,还是能填补不少的N空缺的。

#GapCloser 补洞(config,SOAPdenovo library文件)
GapCloser -b config -a input.fasta -o output.fasta


编辑config文件:
#maximal read length
max_rd_len=149
[LIB]
#average insert size
avg_ins=408
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=4
#use only first 100 bps of each read
rd_len_cutoff=100
#in which order the reads are used while scaffolding
rank=1
# cutoff of pair number for a reliable connection (at least 3 for short insert size)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
map_len=32
#a pair of fastq file, read 1 file should always be followed by read 2 file
q1=/mnt/e/linux/experiment/data/fastp/corrected_cutf_uni_10m_1.fastq
q2=/mnt/e/linux/experiment/data/fastp/corrected_cutf_uni_10m_2.fastq



参考来源:
http://blog.sciencenet.cn/home.php?mod=space&uid=3406804&do=blog&id=1198915
https://www.jianshu.com/p/a31859443fce
12-21 01:45