问题描述
说我正在遵循针对 snakemake .现在,我想知道给定文件(例如 plots/myplot.pdf
)是如何生成的(即哪个版本).我发现只有手头有结果文件夹,这很难甚至很难做到.
Say I'm following the best practise workflow suggested for snakemake. Now I'd like to know how (i.e. which version) a given file, say plots/myplot.pdf
, was generated. I found this surprisingly hard if not impossible only having the result folder at hand.
更详细地说,说我是使用生成结果的. snakemake --use-conda --conda-prefix〜/.conda/myenvs
,它将解析并下载以下规则中指定的conda环境(从文档):
In more detail, say I was generated the results using. snakemake --use-conda --conda-prefix ~/.conda/myenvs
which will resolve and download the conda-environments specified in the rule below (copied from the documentation):
rule NAME:
input:
"table.txt"
output:
"plots/myplot.pdf"
conda:
"envs/ggplot.yaml"
script:
"scripts/plot-stuff.R"
说 envs/ggplot.yaml
的内容如下:
channels:
- conda-forge
dependencies:
- r-ggplot2
完成后,ggplot环境将被保存为say(注意,snakemake自动分配的环境名称d2d1d57b):〜/.conda/myevns/d2d1d57b
After completion the ggplot environment will have been saved under say (note, the env name d2d1d57b assigned by snakemake automatically):~/.conda/myevns/d2d1d57b
问题是,如果我运送 workflow
子文件夹,例如作为其他人的结果(或作为论文的补充),我不知道该运行使用了什么 ggplot
版本.我只知道yaml文件的内容(使用-reports
时也会报告该文件).另外,由于ggplot依赖于其他软件,例如 R
,因此我不知道使用此环境的给定规则使用了哪个R版本,因为yaml文件未列出间接依赖关系.
The problem is that if I ship the workflow
subfolder e.g. as the result to someone else (or as supplement to a paper), I don't know what ggplot
version was used for that run. All I know is the content of the yaml file (which is also reported when using --reports
.).Also, since ggplot depends on other software, such as for instance R
, I wouldn't know which R version was used for a given rule using this environment, since yaml file doesn't list indirect dependencies.
理想情况下,我想随工作流结果一起提供完整的环境软件版本.作为一种解决方法,可以使用 conda env导出name_of_env
并将输出复制到结果文件夹中,但是强烈的 conda list -n〜/.conda/myevns/d2d1d57b
不起作用(由于错误不允许使用字符:('/','',':','#')
)
Ideally, I'd like want to have the complete environment software version shipped with the workflow results.As a workaround one could use conda env export name_of_env
and copy the output in the result folder, but strangly conda list -n ~/.conda/myevns/d2d1d57b
does not work ( due to error Characters not allowed: ('/', ' ', ':', '#')
)
手动创建环境并进行检查确实可以给我(除其他信息外):
Creating a environment manually and inspecting indeed gives me (among other info):
r-base 4.0.2 he766273_1 conda-forge
r-ggplot2 3.3.2 r40h6115d3f_0 conda-forge
这正是我所追求的,但是手动操作当然太繁琐了.
That's exactly what I'm after, but this of course would be too tedious manually.
据我所知,使用包装器时也是如此.
This is also true when using wrappers as far as I can tell.
总而言之,对于给定的工作流程,甚至对于工作流程中的给定文件,如何追溯使用哪些确切的软件版本来生成它.理想情况下,默认情况下,此信息将自动随工作流一起提供.
In summary, given a workflow or even for a given file within the workflow, how to trace back which exact software version(s) were used to generate it. Ideally, this information would be automatically shipped with the result of a workflow by default.
也许我什至错过了一些非常明显的东西,所以希望有人可以对此有所启发.
Maybe I'm even missing something very obvious, so hopefully someone can shed some light on this.
推荐答案
基于我们在评论中的讨论,您可以将环境重定向到日志文件:
Based on our discussion in the comments, you could redirect your environment to a log file:
rule NAME:
input:
"table.txt"
output:
"plots/myplot.pdf"
log:
"mylog.txt"
conda:
"envs/ggplot.yaml"
shell:
"""
conda env export > {log}
yourcode
"""
Which adds the export to each shell command!
现在,如果您使用脚本,则我不确定如何继续.最简单"可能只是调用"conda env export"在python/R内的shell命令中
Now if you use scripts, I am not so sure anymore how to continue. "easiest" might be to just call "conda env export" in a shell command inside python/R
修改
shell前缀技巧似乎无效,因此我删除了文本.
the shell prefix trick does not seem to work, so I striked through the text.
这篇关于如何在snakemake工作流程中追溯用于生成结果文件的确切软件版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!