本文介绍了接受略有不同的Snakemaker规则输入(.fq与.fq.gz)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是Snakemake的新手,我希望能够获取一对.fq
文件或一对.fq.gz
文件,并通过trim_galore
运行它们来获得一对经过修剪的.fq.gz
输出文件。在不给出我所有的Snakefile的情况下,我有了下面这个难看的解决方案,我只是复制了规则并更改了输入。什么是更好的解决方案?#Trim galore paired end trimming rule for unzipped fastqs:
rule trim_galore_unzipped_PE:
input:
r1=join(config['fq_in_path'], '{sample}1.fq'),
r2=join(config['fq_in_path'], '{sample}2.fq'),
output:
r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'
#Trim galore paired end trimming rule for gzipped fastqs:
rule trim_galore_zipped_PE:
input:
r1=join(config['fq_in_path'], '{sample}1.fq.gz'),
r2=join(config['fq_in_path'], '{sample}2.fq.gz'),
output:
r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'
推荐答案
使用输入函数可能是最佳解决方案,如下所示:
- 将通配符传递给输入函数
- 使用已知的YAML值,使用该示例名称构建理论文件名。
- 使用python函数检查哪个文件(从技术上讲是文件后缀)有效
- 生成有效文件列表
- 返回并解压缩有效文件列表。
备注:
- 输入和输出应具有相同的通配符,否则会导致问题
- 在输入函数中,确保它不能返回空字符串,因为Snakemake将其解释为"缺少输入"要求,而这不是您想要的。
- 如果您采纳这些建议,请更新规则名称,我忘记了。
Snakefile:
configfile: "config.yaml"
from os.path import join
from os.path import exists
rule all:
input:
expand("{trim_out_path}/{sample}.{readDirection}.fq.gz",
trim_out_path=config["trim_out_path"],
sample=config["sampleList"],
readDirection=['1','2'])
def trim_galore_input_determination(wildcards):
potential_file_path_list = []
# Cycle through both suffix possibilities:
for fastqSuffix in [".fq", ".fq.gz"]:
# Cycle through both read directions
for readDirection in ['.1','.2']:
#Build the list for ech suffix
potential_file_path = config["fq_in_path"] + "/" + wildcards.sample + readDirection + fastqSuffix
#Check if this file actually exists
if exists(potential_file_path):
#If file is legit, add to list of acceptable files
potential_file_path_list.append(potential_file_path)
# Checking for an empty list
if len(potential_file_path_list):
return potential_file_path_list
else:
return ["trim_galore_input_determination_FAILURE" + wildcards.sample]
rule trim_galore_unzipped_PE:
input:
unpack(trim_galore_input_determination)
output:
expand("{trim_out_path}/{{sample}}.{readDirection}.fq.gz",
trim_out_path=config["trim_out_path"],
readDirection=['1','2'])
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input}'
config.yaml:
fq_in_path: input/fq
trim_out_path: output
sampleList: ["mySample1", "mySample2"]
$tree:
|-- [tboyarsk 1540 Sep 6 15:17] Snakefile
|-- [tboyarsk 82 Sep 6 15:17] config.yaml
|-- [tboyarsk 512 Sep 6 8:55] input
| |-- [tboyarsk 512 Sep 6 8:33] fq
| | |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq
| | |-- [tboyarsk 0 Sep 6 8:24] mySample1.2.fq
| | |-- [tboyarsk 0 Sep 6 7:50] mySample2.1.fq
| | `-- [tboyarsk 0 Sep 6 8:24] mySample2.2.fq
| `-- [tboyarsk 512 Sep 6 8:55] fqgz
| |-- [tboyarsk 0 Sep 6 7:50] mySample1.1.fq.gz
| |-- [tboyarsk 0 Sep 6 8:32] mySample1.2.fq.gz
| |-- [tboyarsk 0 Sep 6 8:33] mySample2.1.fq.gz
| `-- [tboyarsk 0 Sep 6 8:32] mySample2.2.fq.gz
`-- [tboyarsk 512 Sep 6 7:55] output
$Snakemake-Dry(输入:fg)
rule trim_galore_unzipped_PE:
input: input/fq/mySample1.1.fq, input/fq/mySample1.2.fq
output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz
jobid: 1
wildcards: sample=mySample1
rule trim_galore_unzipped_PE:
input: input/fq/mySample2.1.fq, input/fq/mySample2.2.fq
output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz
jobid: 2
wildcards: sample=mySample2
localrule all:
input: output/mySample1.1.fq.gz, output/mySample2.1.fq.gz, output/mySample1.2.fq.gz, output/ mySample2.2.fq.gz
jobid: 0
Job counts:
count jobs
1 all
2 trim_galore_unzipped_PE
3
$Snakemake-Dry(输入:fgqz)
rule trim_galore_unzipped_PE:
input: input/fqgz/mySample1.1.fq.gz, input/fqgz/mySample1.2.fq.gz
output: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz
jobid: 1
wildcards: sample=mySample1
rule trim_galore_unzipped_PE:
input: input/fqgz/mySample2.1.fq.gz, input/fqgz/mySample2.2.fq.gz
output: output/mySample2.1.fq.gz, output/mySample2.2.fq.gz
jobid: 2
wildcards: sample=mySample2
localrule all:
input: output/mySample1.1.fq.gz, output/mySample1.2.fq.gz, output/mySample2.1.fq.gz, output/ mySample2.2.fq.gz
jobid: 0
Job counts:
count jobs
1 all
2 trim_galore_unzipped_PE
3
有一些方法可以使其更通用,但由于您声明并使用YAML配置来构建大部分文件名,因此我将避免在答案中讨论它。我只是说这是可能的,并在某种程度上受到鼓励。"--配对的{输入}"将展开以提供这两个文件。由于for循环,%1将始终位于%2之前。
这篇关于接受略有不同的Snakemaker规则输入(.fq与.fq.gz)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!