问题描述
我有一个带有 ChIP-seq 单端 fastq 文件名的列表对象 allfiles=['/path/file1.fastq','/path/file2.fastq','/path/file3.fastq']
.我正在尝试将该对象 allfiles
设置为通配符(我想要输入 fastqc 规则(以及其他规则,例如映射,但让我们保持简单).我尝试了下面的代码(lambda 通配符:data.loc[(wildcards.sample),'read1']
).但是,这给了我错误
I have a list object with ChIP-seq single-end fastq file names allfiles=['/path/file1.fastq','/path/file2.fastq','/path/file3.fastq']
. I'm trying to set that object, allfiles
, as a wildcard (I want the input of the fastqc rule (and others such as mapping, but let's keep it simple). I tried what is seen in the code below (lambda wildcards: data.loc[(wildcards.sample),'read1']
). This, however, is giving me the error
"InputFunctionException in line 118 of Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:
"
有人知道如何定义它吗?看来我很接近了,我得到了大致的想法,但我无法正确地获得语法并执行它.谢谢!
Does someone know exactly how to define it? It seems I am close, I get the general idea but I am failing to get the syntax correct and execute it. Thank you !
代码:
import pandas as pd
import numpy as np
# Read in config file parameters
configfile: 'config.yaml'
sampleFile = config['samples'] # three columns: sample ID , /path/to/chipseq_file_SE.fastq , /path/to/chipseq_input.fastq
outputDir = config['outputdir'] # output directory
outDir = outputDir + "/MyExperiment"
qcDir = outDir + "/QC"
# Read in the samples table
data = pd.read_csv(sampleFile, header=0, names=['sample', 'read1', 'inputs']).set_index('sample', drop=False)
samples = data['sample'].unique().tolist() # sample IDs
read1 = data['read1'].unique().tolist() # ChIP-treatment file single-end file
inplist= data['inputs'].unique().tolist() # the ChIP-input files
inplistUni= data['inputs'].unique().tolist() # the ChIP-input files (unique)
allfiles = read1 + inplistUni
# Target rule
rule all:
input:
expand(f'{qcDir}' + '/raw/{sample}_fastqc.html', sample=samples),
expand(f'{qcDir}' + '/raw/{sample}_fastqc.zip', sample=samples),
# fastqc report generation
rule fastqc:
input: lambda wildcards: data.loc[(wildcards.sample), 'read1']
output:
html=expand(f'{qcDir}' + '/raw/{sample}_fastqc.html',sample=samples) ,
zip=expand(f'{qcDir}' + '/raw/{sample}_fastqc.zip',sample=samples)
log: expand(f'{logDir}' + '/qc/{sample}_fastqc_raw.log',sample=samples)
threads: 4
wrapper: "fastqc {input} 2>> {log}"
推荐答案
当前 rule fastqc
的 output
文件在解析后没有任何通配符.也就是说,蛇文件中当前有一项工作,其中 rule fastqc
尝试为所有样本生成一个输出文件.
Currently output
files of rule fastqc
doesn't have any wildcards once they are resolved. That is, there is currently one job in the snakefile where rule fastqc
tries to produce one output file for all samples.
但是,您似乎希望为每个样本单独运行 rule fastqc
.在这种情况下,它需要概括如下,其中 {sample}
是通配符:
However, it appears you would like to run rule fastqc
separately for each sample. In that case, it needs to be generalized as below, where {sample}
is the wildcard:
rule fastqc:
input: lambda wildcards: data.loc[(wildcards.sample), 'read1']
output:
html = qcDir + '/raw/{sample}_fastqc.html,
zip=qcDir + '/raw/{sample}_fastqc.zip'
log: logDir + '/qc/{sample}_fastqc_raw.log'
threads: 4
shell: "fastqc {input} 2>> {log}"
这篇关于Snakemake InputFunctionException.AttributeError:“通配符"对象没有属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!