在bash中循环文件

在bash中循环文件

本文介绍了在bash中循环文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想遍历这类文件,其中具有相同Sample_ID的文件必须一起使用

I want to loop over these kind of files, where the the files with same Sample_ID have to be used together

Sample_51770BL1_R1.fastq.gz
Sample_51770BL1_R2.fastq.gz

Sample_52412_R1.fastq.gz
Sample_52412_R2.fastq.gz

例如在一个命令中一起使用Sample_51770BL1_R1.fastq.gz和Sample_51770BL1_R2.fastq.gz来创建输出.

e.g. Sample_51770BL1_R1.fastq.gz and Sample_51770BL1_R2.fastq.gz are used together in one command to create an output.

类似地,Sample_52412_R1.fastq.gz和Sample_52412_R2.fastq.gz一起用于创建输出.

Similarly, Sample_52412_R1.fastq.gz and Sample_52412_R2.fastq.gz are used together to create output.

我想在bash中编写一个for循环以遍历并创建输出.

I want to write a for loop in bash to iterate over and create output.

sourcedir=/sourcepath/
destdir=/destinationpath/


bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta Sample_52412_R1.fastq.gz  Sample_52412_R2.fastq.gz>$destdir/Sample_52412_R1_R2.sam

我应该如何模式匹配要在一个命令中使用的文件名Sample_ID_R1和Sample_ID_R2?

How should I pattern match the file names Sample_ID_R1 and Sample_ID_R2 to be used in one command?

谢谢

推荐答案

for fname in *_R1.fastq.gz
do
    base=${fname%_R1*}
    bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta "${base}_R1.fastq.gz"  "${base}_R2.fastq.gz" >"$destdir/${base}_R1_R2.sam"
done

在注释中,您询问有关并行运行几个但不是太多的作业的信息.以下是我对此的第一个刺探:

In the comments, you ask about running several, but not too many, jobs in parallel. Below is my first stab at that:

#!/bin/bash
# Limit background jobs to no more that $maxproc at once.
maxproc=3

for fname in *  # _R1.fastq.gz
do
    while [ $(jobs | wc -l) -ge "$maxproc" ]
    do
        sleep 1
    done
    base=${fname%_R1*}
    echo starting new job with ongoing=$(jobs | wc -l)
    bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta "${base}_R1.fastq.gz" "${base}_R2.fastq.gz" >"$destdir/${base}_R1_R2.sam" &
done

maxproc的最佳值将取决于您的PC拥有多少个处理器.您可能需要尝试找出最有效的方法.

The optimal value of maxproc will depend on how many processors your PC has. You may need to experiment to find what works best.

请注意,以上脚本使用jobs,它是bash内置函数.因此,它必须在bash而不是破折号下运行,破折号是Debian-like发行版中脚本的默认设置.

Note that the above script uses jobs which is a bash builtin function. Thus, it has to be run under bash, not dash which is the default for scripts under Debian-like distributions.

这篇关于在bash中循环文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 17:20