一次运行Snakemake规则一个样本

一次运行Snakemake规则一个样本

本文介绍了一次运行Snakemake规则一个样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个Snakemake工作流程,该工作流程将包装 nvidia clara parabricks管道中的工具.由于这些工具在GPU上运行,因此通常一次只能处理一个样本,否则GPU会耗尽内存.但是,Snakemake一次将所有样本推送到Parabricks-似乎没有意识到GPU内存限制.一种解决方案是告诉Snakemake一次处理一个样本,因此出现问题:

I'm creating a Snakemake workflow that will wrap up some of the tools in the nvidia clara parabricks pipelines. Because these tools run on GPU's, they typically can only handle one sample at a time, otherwise the GPU will run out of memory. However, Snakemake shoves all the samples through to Parabricks at one time - seemingly unaware of the GPU memory limits. One solution would be to tell Snakemake to process one sample at a time, thus the question:

如何让Snakemake一次处理一个样品?

由于parabricks是一种许可产品(因此不一定可复制),因此我将展示我尝试运行的parabricks规则的示例(pbrun fastq2bam),以及使用开源软件进行复制的最小示例(fastqc),我们可以对其进行研究/从中进行

Snakefile:

Snakefile:

# Define samples from fastq dir using wildcards
SAMPLES, = glob_wildcards("../fastq/{sample}_1.filt.fastq.gz")

rule all:
    input:
        expand("{sample}_recalibrated.bam", sample = SAMPLES)

rule pbrun_fq2bam:
    input:
        R1 = "../fastq/{sample}_1.filt.fastq.gz",
        R2 = "../fastq/{sample}_2.filt.fastq.gz"
    output:
        bam = "{sample}_recalibrated.bam",
        recal = "{sample}_recal.txt"
    shell:
        "pbrun fq2bam --ref human_g1k_v37_decoy.fasta --in-fq {input.R1} {input.R2} --knownSites dbsnp_138.b37.vcf --out-bam {output.bam} --out-recal {output.recal}"

运行命令:

snakemake -j 32 --use-conda

../fastq/目录中存在四个样本/外显子组时出错:

Error when four samples/exomes are present in the ../fastq/ directory:

GPU-BWA mem
ProgressMeter   Reads           Base Pairs Aligned
cudaSafeCall() failed at ParaBricks/src/samGenerator.cu:782 : out of memory
cudaSafeCall() failed at ParaBricks/src/samGenerator.cu:782 : out of memory
cudaSafeCall() failed at ParaBricks/src/chainGenerator.cu:185 : out of memory
cudaSafeCall() failed at ParaBricks/src/chainGenerator.cu:185 : out of memory
cudaSafeCall() failed at ParaBricks/src/chainGenerator.cu:185 : out of memory
cudaSafeCall() failed at ParaBricks/src/chainGenerator.cu:183 : out of memory
cudaSafeCall() failed at ParaBricks/src/chainGenerator.cu:185 : out of memory
cudaSafeCall() failed at ParaBricks/src/chainGenerator.cu:183 : out of memory

最小示例-fastqc

获取数据:

mkdir ../fastq/
gsutil cp -r gs://genomics-public-data/gatk-examples/example1/NA19913/* ../fastq/

Snakefile:

Snakefile:

SAMPLES, = glob_wildcards("../fastq/{sample}_1.filt.fastq.gz")

rule all:
    input:
        expand(["{sample}_1.filt_fastqc.html", "{sample}_2.filt_fastqc.html"], sample = SAMPLES),
        expand(["{sample}_1.filt_fastqc.zip", "{sample}_2.filt_fastqc.zip"], sample = SAMPLES)

rule fastqc:
    input:
        R1 = "../fastq/{sample}_1.filt.fastq.gz",
        R2 = "../fastq/{sample}_2.filt.fastq.gz"
    output:
        html = ["{sample}_1.filt_fastqc.html", "{sample}_2.filt_fastqc.html"],
        zip = ["{sample}_1.filt_fastqc.zip", "{sample}_2.filt_fastqc.zip"]
    conda:
        "fastqc.yaml"
    shell:
        "fastqc {input.R1} {input.R2} --outdir ."

fastqc.yaml:

fastqc.yaml:

channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - bioconda::fastqc =0.11.9

运行命令:

snakemake -j 32 --use-conda

在此先感谢任何指针!

推荐答案

我想扩展@jafors的答案.也许最好做些什么而不是限制内存,您可以创建一个gpu资源:

I would like to expand on the answer of @jafors. Probably what is better to do instead of limiting the memory, you can make a gpu resource:

rule pbrun_fq2bam:
...
    resources:
        gpu=1

然后使用-resources gpu = 1

在这种情况下,您仍然可以将内存和线程用于其他规则,并且每个资源都说明了它是什么.

This case you can still use memory and threads for other rules and every resource describes what it is.

这篇关于一次运行Snakemake规则一个样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 23:54