问题描述
我想遍历这类文件,其中具有相同Sample_ID的文件必须一起使用
I want to loop over these kind of files, where the the files with same Sample_ID have to be used together
Sample_51770BL1_R1.fastq.gz
Sample_51770BL1_R2.fastq.gz
Sample_52412_R1.fastq.gz
Sample_52412_R2.fastq.gz
例如在一个命令中一起使用Sample_51770BL1_R1.fastq.gz和Sample_51770BL1_R2.fastq.gz来创建输出.
e.g. Sample_51770BL1_R1.fastq.gz and Sample_51770BL1_R2.fastq.gz are used together in one command to create an output.
类似地,Sample_52412_R1.fastq.gz和Sample_52412_R2.fastq.gz一起用于创建输出.
Similarly, Sample_52412_R1.fastq.gz and Sample_52412_R2.fastq.gz are used together to create output.
我想在bash中编写一个for循环以遍历并创建输出.
I want to write a for loop in bash to iterate over and create output.
sourcedir=/sourcepath/
destdir=/destinationpath/
bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta Sample_52412_R1.fastq.gz Sample_52412_R2.fastq.gz>$destdir/Sample_52412_R1_R2.sam
我应该如何模式匹配要在一个命令中使用的文件名Sample_ID_R1和Sample_ID_R2?
How should I pattern match the file names Sample_ID_R1 and Sample_ID_R2 to be used in one command?
谢谢
推荐答案
for fname in *_R1.fastq.gz
do
base=${fname%_R1*}
bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta "${base}_R1.fastq.gz" "${base}_R2.fastq.gz" >"$destdir/${base}_R1_R2.sam"
done
在注释中,您询问有关并行运行几个但不是太多的作业的信息.以下是我对此的第一个刺探:
In the comments, you ask about running several, but not too many, jobs in parallel. Below is my first stab at that:
#!/bin/bash
# Limit background jobs to no more that $maxproc at once.
maxproc=3
for fname in * # _R1.fastq.gz
do
while [ $(jobs | wc -l) -ge "$maxproc" ]
do
sleep 1
done
base=${fname%_R1*}
echo starting new job with ongoing=$(jobs | wc -l)
bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta "${base}_R1.fastq.gz" "${base}_R2.fastq.gz" >"$destdir/${base}_R1_R2.sam" &
done
maxproc
的最佳值将取决于您的PC拥有多少个处理器.您可能需要尝试找出最有效的方法.
The optimal value of maxproc
will depend on how many processors your PC has. You may need to experiment to find what works best.
请注意,以上脚本使用jobs
,它是bash内置函数.因此,它必须在bash而不是破折号下运行,破折号是Debian-like发行版中脚本的默认设置.
Note that the above script uses jobs
which is a bash builtin function. Thus, it has to be run under bash, not dash which is the default for scripts under Debian-like distributions.
这篇关于在bash中循环文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!