问题描述
是否可以在集群配置文件中定义内存和资源的默认设置,然后在需要时以规则特定的方式覆盖?规则中的资源
字段是否直接绑定到群集配置文件?还是出于可读性目的, params
字段的一种奇特方式?
Is it possible to define default settings for memory and resources in cluster config file, and then override in rule specific manner, when needed? Is resources
field in rules directly tied to cluster config file? Or is it just a fancy way for params
field for readability purposes?
在下面的示例中,我如何对规则a
使用默认群集配置,但使用自定义更改( memory = 40000
和 rusage = 15000
)在规则b
中?
In the example below, how do I use default cluster configs for rule a
, but use custom changes (memory=40000
and rusage=15000
) in rule b
?
cluster.json:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
Snakefile:
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
执行命令:
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
我知道可以在集群配置文件中定义特定于规则的资源要求,但是,如果可能的话,我宁愿直接在Snakefile中定义它们。
I understand that it is possible to define rule specific resources requirements in cluster config file, but I would prefer to define them directly in Snakefile, if possible.
否则,如果有更好的实现方法,请告诉我。
Or else, if there is a better way of implementing this, please let me know.
推荐答案
在 new.cluster.json
您实际上可以为特定规则定义资源。因此,根据您的情况,您可以执行以下操作
In new.cluster.json
you can actually define resources for specific rules. So in your case you would do the following
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
"b":
{
"memory": 40000,
"resources": "\"rusage[mem=15000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
然后在 Snakefile
中可以参考这些导入 new.cluster.json
并在您的规则中引用它
Then in the Snakefile
you can refer to these resources by importing new.cluster.json
and referring to it in your rule
import json
with open('new.cluster.json') as fh:
cluster_config = json.load(fh)
rule all:
'a_out.txt' , 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
resources:
mem_mb=cluster_config["b"]["memory"]
shell:
'touch {output}'
如果您通过,您可以看到我如何在野外使用这些集群配置。
If you take a look through this repository, you can see how I use these cluster configs in the wild.
这篇关于Snakemake-以特定于规则的方式覆盖LSF(bsub)群集配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!