问题描述
我的论文研究软件的源代码树( R
)反映了传统的研究工作流程:收集数据 - 准备数据 - >分析数据 - >收集结果 - >发布结果。我使用 make
来建立和维护工作流程(大部分项目的子目录包含 Makefile
文件)。 p>
但是,我经常需要通过项目子目录中的特定Makefile目标来执行我的工作流程的各个部分(不是通过顶级 Makefile
)。这会导致设置 Makefile
规则以在工作流程的不同部分的目标之间维护依赖关系的问题,其他单词 - Makefile
文件中的目标之间,位于不同的子目录。
以下代表设置为我的论文项目:
+ - diss-floss(项目的根)
| - - 数据收集
| - 缓存(R数据对象(),表示不同的数据源)在子目录中
| - +准备(数据清理,转换,合并和抽样)
| - R模块,包括'transform.R'
| - 分析(数据分析,包括探索性数据分析(EDA))
| - R模块,包括'eda.R
| - +结果(子目录中的分析结果)
| - + eda(* .svg,* .pdf,...)
| - ..
| - 现在(自动生成的防御演示文稿)
我的某些 Makefile
文件中的目标片段:
〜/ diss-floss / Makefile(几乎全部):
#主要变量定义
PROJECT =diss-floss
HOME_DIR =〜/ diss-floss
REPORT = {$(PROJECT)-slides}
COLLECTION_DIR = import
PREPARATION_DIR = prepare
ANALYSIS_DIR = analysis
RESULTS_DIR = results
PRESENTATION_DIR = present
RSCRIPT = Rscript
#目标和规则
全部:rprofile收集准备分析结果演示
rprofile:
R CMD BATCH ./.Rprofile
collection:
cd $(COLLECTION_DIR)&& $(MAKE)
准备:收集
cd $(PREPARATION_DIR)&& $(MAKE)
分析:准备
cd $(ANALYSIS_DIR)&& $(MAKE)
结果:分析
cd $(RESULTS_DIR)&& $(MAKE)
演示文稿:结果
cd $(PRESENTATION_DIR)&& $(MAKE)
## Phony目标和规则(对于不生成文件的命令)
#.html
.PHONY:demo清理
#运行演示幻灯片
演示:演示
#knitr(Markdown)=> HTML页面
#通过RStudio / RPubs或Slidify进行HTML5演示
#OR
#Shiny app
#删除中间文件
clean:
rm -f tmp * .bz2 * .Rdata
〜/ diss-floss / import / Makefile :$ / $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
...
〜/ diss-floss / prepare / Makefile p>
transform:transform.R
$(RSCRIPT)$(R_OPTS)$<
...
〜/ diss-floss / analysis / Makefile p>
eda:eda.R
@ $(RSCRIPT)$(R_OPTS)$<
目前,我担心创建以下依赖关系:
通过在 import
中的 Makefile
中的目标收集的数据总是需要通过在 Makefile
之前,通过 eda.R
。如果我在 import
中手动运行 make
,然后忘记转换,运行 make eda
在分析
,事情没有太好。因此,我的问题是:
如何使用 make
实用程序的功能(最简单方式可能)建立和维护不同目录中 Makefile
文件之间的依赖关系的规则
以下是我的想法(来自@ MrFlick的答案的一些想法 - 谢谢),将我的研究工作流的数据依赖关系添加到项目当前的 make
基础设施(包含代码片段)。我也尝试通过在 make
目标之间指定依赖关系来反映所需的工作流程。
import / Makefile:
importFLOSSmole:getFLOSSmoleDataXML.R FLOSSmole。 RData
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
(其他数据源的类似目标)
准备/ Makefile:
IMPORT_DIR = .. / import
准备:import \
转换\
清理\
合并\
样本
import:$ IMPORT_DIR / importFLOSSmole。根据需要完成#和/或其他标志文件
transform:transform.R import
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
cleanup:cleanup.R transform
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
合并:merge.R清理
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
sample:sample.R merge
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
分析/ Makefile:
PREP_DIR = .. / prepare
分析:准备\
eda \
efa \
cfa \
sem
准备:$ PREP_DIR / transform.done#和/或其他标志文件,根据需要
eda:eda.R准备
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
efa:efa.R eda
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
cfa:cfa.R efa
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
sem:sem.R cfa
@ $(RSCRIPT)$(R_OPTS)$<
@touch $ @。done
Makefile
目录中的文件结果
和存在
仍然是TBD。
我会感谢您对上述的想法和建议!
Source code tree (R
) for my dissertation research software reflects traditional research workflow: "collect data -> prepare data -> analyze data -> collect results -> publish results". I use make
to establish and maintain the workflow (most of the project's sub-directories contain Makefile
files).
However, frequently, I need to execute individual parts of my workflow via particular Makefile targets in project's sub-directories (not via top-level Makefile
). This creates a problem of setting up Makefile
rules to maintain dependencies between targets from different parts of the workflow, in other words - between targets in Makefile
files, located in different sub-directories.
The following represents the setup for my dissertation project:
+-- diss-floss (Project's root)
|-- import (data collection)
|-- cache (R data objects (), representing different data sources, in sub-directories)
|-+ prepare (data cleaning, transformation, merging and sampling)
|-- R modules, including 'transform.R'
|-- analysis (data analyses, including exploratory data analysis (EDA))
|-- R modules, including 'eda.R'
|-+ results (results of the analyses, in sub-directories)
|-+ eda (*.svg, *.pdf, ...)
|-- ...
|-- present (auto-generated presentation for defense)
Snippets of targets from some of my Makefile
files:
"~/diss-floss/Makefile" (almost full):
# Major variable definitions
PROJECT="diss-floss"
HOME_DIR="~/diss-floss"
REPORT={$(PROJECT)-slides}
COLLECTION_DIR=import
PREPARATION_DIR=prepare
ANALYSIS_DIR=analysis
RESULTS_DIR=results
PRESENTATION_DIR=present
RSCRIPT=Rscript
# Targets and rules
all: rprofile collection preparation analysis results presentation
rprofile:
R CMD BATCH ./.Rprofile
collection:
cd $(COLLECTION_DIR) && $(MAKE)
preparation: collection
cd $(PREPARATION_DIR) && $(MAKE)
analysis: preparation
cd $(ANALYSIS_DIR) && $(MAKE)
results: analysis
cd $(RESULTS_DIR) && $(MAKE)
presentation: results
cd $(PRESENTATION_DIR) && $(MAKE)
## Phony targets and rules (for commands that do not produce files)
#.html
.PHONY: demo clean
# run demo presentation slides
demo: presentation
# knitr(Markdown) => HTML page
# HTML5 presentation via RStudio/RPubs or Slidify
# OR
# Shiny app
# remove intermediate files
clean:
rm -f tmp*.bz2 *.Rdata
"~/diss-floss/import/Makefile":
importFLOSSmole: getFLOSSmoleDataXML.R
@$(RSCRIPT) $(R_OPTS) $<
...
"~/diss-floss/prepare/Makefile":
transform: transform.R
$(RSCRIPT) $(R_OPTS) $<
...
"~/diss-floss/analysis/Makefile":
eda: eda.R
@$(RSCRIPT) $(R_OPTS) $<
Currently, I am concerned about creating the following dependency:
Data, collected by making a target from Makefile
in import
, always needs to be transformed by making corresponding target from Makefile
in prepare
before being analyzed via, for example eda.R
. If I manually run make
in import
and then, forgetting about transformation, run make eda
in analyze
, things are not going too well. Therefore, my question is:
How could I use features of the make
utility (in a simplest way possible) to establish and maintain rules for dependencies between targets from Makefile
files in different directories?
The following are my thoughts (with some ideas from @MrFlick's answer - thank you) on adding my research workflow's data dependencies to the project's current make
infrastructure (with snippets of code). I have also tried to reflect the desired workflow by specifying dependencies between make
targets.
import/Makefile:
importFLOSSmole: getFLOSSmoleDataXML.R FLOSSmole.RData
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
(similar targets for other data sources)
prepare/Makefile:
IMPORT_DIR=../import
prepare: import \
transform \
cleanup \
merge \
sample
import: $IMPORT_DIR/importFLOSSmole.done # and/or other flag files, as needed
transform: transform.R import
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
cleanup: cleanup.R transform
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
merge: merge.R cleanup
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
sample: sample.R merge
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
analysis/Makefile:
PREP_DIR=../prepare
analysis: prepare \
eda \
efa \
cfa \
sem
prepare: $PREP_DIR/transform.done # and/or other flag files, as needed
eda: eda.R prepare
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
efa: efa.R eda
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
cfa: cfa.R efa
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
sem: sem.R cfa
@$(RSCRIPT) $(R_OPTS) $<
@touch [email protected]
The contents of Makefile
files in directories results
and present
are still TBD.
I would appreciate your thoughts and advice on the above!
这篇关于为项目子目录中的目标创建依赖关系的制定规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!