从样本表中获取样本特定参数

从样本表中获取样本特定参数

本文介绍了Snakemake和Pandas语法:从样本表中获取样本特定参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,这可能是 Snakemake和pandas语法的副本.但是,我仍然很困惑,所以我想再次解释.

First off all, this could be a duplicate of Snakemake and pandas syntax. However, I'm still confused so I'd like to explain again.

在Snakemake中,我加载了带有几列的示例表.其中一列称为"Read1",其中包含特定于样本的读取长度.我想分别为每个样本获取此值,因为它可能有所不同.

In Snakemake I have loaded a sample table with several columns. One of the columns is called 'Read1', it contains sample specific read lengths. I would like to get this value for every sample separately as it may differ.

我期望的是这样的:

rule mismatch_profile:
    input:
        rseqc_input_bam
    output:
        os.path.join(rseqc_dir, '{sample}.mismatch_profile.xls')
    conda:
        "../envs/rseqc.yaml"
    params:
        read_length = samples.loc['{sample}']['Read1']
    shell:
        '''
        #!/bin/bash
        mismatch_profile.py -i {input} -o {rseqc_dir}/{wildcards.sample} -l {params.read_length}

但是,这不起作用.由于某些原因,我不允许在标准Pandas语法中使用{sample},但出现此错误:

However, that does not work. For some reason I am not allowed to use {sample} inside standard Pandas syntax and I get this error:

KeyError in line 41 of /rst1/2017-0205_illuminaseq/scratch/swo-406/test_snakemake_full/rules/rseqc.smk:
'the label [{sample}] is not in the [index]'

我不明白为什么这行不通.我读到我也可以使用lambda函数,但由于它们仍然需要{sample}作为输入,因此我并不十分确切.

I don't understand why this does not work. I read that I can also use lambda functions but I don't really understand exactly how since they still need {sample} as input.

有人可以帮我吗?

推荐答案

您可以使用lambda函数

You could use lambda function

params:
    read_length = lambda wildcards: samples.loc[wildcards.sample, 'Read1']

这篇关于Snakemake和Pandas语法:从样本表中获取样本特定参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 16:25