本文介绍了R 脚本中的 here() 问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解 here() 如何以可移植的方式工作.找到它:在 Final answer - TL;DR 下查看稍后的工作原理 - 最重要的是,here() 运行 script.R 来自命令行.

I am trying to understand how would here() work in a portable way. Found it: See what works later under Final answer - TL;DR - the bottom line, here() is not really that useful running a script.R from commandline.

我在 JBGruber 的帮助下理解它的方式:here() 查找项目的根目录(例如,RStudio 项目、Git 项目或其他使用 .here 文件定义的项目)从当前工作目录开始并向上移动直到找到任何项目.如果它没有找到任何东西,它会退回到使用完整的工作目录.如果由 cron 运行的脚本将默认为我的主目录.当然,可以通过 cron 命令将目录作为参数传递,但它相当麻烦.下面的答案提供了很好的解释,我总结了我在最终答案部分"下发现的最有用的内容.但不要误会,Nicola 的回答非常好,也很有帮助.

The way I understand it with help from JBGruber: here() looks for the root directory of a project (e.g., an RStudio project, Git project or other project defined with a .here file) starting at the current working directory and moving up until it finds any project. If it doesn't find anything it falls back to using the full working directory. Which in case of a script run by cron will default to my home directory. One could, of course, pass directory as a parameter via cron command, but it is rather cumbersome. Below answers provide good explanations and I have summarised what I found most immediately useful under "Final Answer section". But make no mistake, Nicola's answer is very good and helpful too.

原始目标 - 编写一组 R 脚本,包括 R-markdown .Rmd 以便我可以压缩目录,发送给其他人,它会继续运行他们的电脑.可能在非常低端的计算机上 - 例如 RaspberryPi 或运行 linux 的旧硬件.

Original Objective - write a set of R scripts, including R-markdown .Rmd so that I can zip the directory, send to someone else and it would run on their computer. Potentially on a very low end computer - such as RaspberryPi or old hardware running linux.

条件:

  • 可以通过 Rscript
  • 从命令行运行
  • 如上但通过cron
  • 安排
  • 设置工作目录的主要方法是 set_here() - 从控制台执行一次,然后文件夹是可移植的,因为 .here 文件包含在压缩目录中.
  • 不需要 Rstudio - 因此不想做 R 项目
  • 也可以Rstudio(开发)
  • 交互式运行
  • 可以从 shiny 执行(我认为如果满足上述条件就可以了)
  • can be run from commandline via Rscript
  • as above but scheduled via cron
  • main method for setting up working directory is set_here() - executed once from console and then the folder is portable because the .here file is included on the zipped directory.
  • does not need Rstudio - hence do not want to do R-projects
  • can also be run interactively from Rstudio (development)
  • can be executed from shiny (I assume that will be OK if the above conditions are met)

我特别不想创建 Rstudio 项目,因为在我看来它需要安装和使用 Rstudio,但我希望我的脚本尽可能具有可移植性,并在低资源、无头平台上运行.

I specifically do not want to create Rstudio projects because in my view it necessitates to install and use Rstudio, but I want my scripts to be as portable as possible and run on low resource, headless platforms.

假设工作目录为 myGoodScripts,如下所示:

Let us assume the working directory will be myGoodScripts as follows:

/Users/john/src/myGoodScripts/

开始开发时,我会使用 setwd() 进入上述目录并执行 set_here() 以创建 .here 文件.然后有2个脚本dataFetcherMailer.RdataFetcher.Rmd和一个子目录bkp:

when starting development I would go to the above directory with setwd() and execute set_here() to create .here file. Then there are 2 scripts dataFetcherMailer.R, dataFetcher.Rmd and a subdirectory bkp:

library(here)
library(knitr)

basedir <- here()
# this is where here should give path to .here file

rmarkdown::render(paste0(basedir,"/dataFetcher.Rmd"))

# email the created report
# email_routine_with_gmailr(paste0(basedir,"dataFetcher.pdf"))
# now substituted with verification that a pdf report was created
file.exists(paste0(basedir,"/dataFetcher.pdf"))

dataFetcher.Rmd

---
title: "Data collection control report"
author: "HAL"
date: "`r Sys.Date()`"
output: pdf_document
---

```{r setup, include=FALSE}
library(knitr)
library(here)

basedir <- here()

# in actual program this reads data from a changing online data source
df.main <- mtcars

# data backup
datestamp <- format(Sys.time(),format="%Y-%m-%d_%H-%M")
backupName <- paste0(basedir,"/bkp/dataBackup_",datestamp,"csv.gz")
write.csv(df.main, gzfile(backupName))
```

# This is data collection report

Yesterday's data total records: `r nrow(df.main)`.

The basedir was `r basedir`

The current directory is `r getwd()`

The here path is `r here()`

我猜报告中的最后 3 行是匹配的.即使 getwd() 与其他两个不匹配,也没关系,因为 here() 将确保绝对基路径.

The last 3 lines in the report would be matching, I guess. Even if getwd() does not match the other two, it should not matter, because here() would ensure an absolute basepath.

当然 - 以上不起作用.只有当我从同一个 myGoodScripts/ 目录执行 Rscript ./dataFetcherMailer.R 时它才有效.

Of course - the above does not work. It only works if I execute Rscript ./dataFetcherMailer.R from the same myGoodScripts/ directory.

我的目标是了解如何执行脚本,以便相对于脚本的位置解析相对路径,并且可以独立于当前工作目录从命令行运行脚本.现在,只有在我对包含脚本的目录执行了 cd 操作后,我才能从 bash 运行它.如果我安排 cron 执行脚本,默认工作目录将是 /home/user 并且脚本失败.我的天真方法是不管 shell 的当前工作目录 basedir <- here() 应该给出一个可以解析相对路径的文件系统点是行不通的.

My aim is to understand how to execute the scripts so that relative paths are resolved relative to the script's location and the script can be run from commandline independent of the current working directory. I now can run this from bash only if I have done cd to the directory containing the script. If I schedule cron to execute the script the default working directory would be /home/user and script fails. My naive approach that regardless of the shell's current working directory basedir <- here() should give a filesystem point from which relative paths could be resolved is not working.

来自Rstudio,没有事先setwd()

From Rstudio without prior setwd()

here() starts at /home/user
Error in abs_path(input) :
The file '/home/user/dataFetcher.Rmd' does not exist.

如果 cwd 未设置为脚本目录,则使用 Rscript 来自 bash.

From bash with Rscript if cwd not set to the script directory.

$ cd /home/user/scrc
$ Rscript ./myGoodScripts/dataFetcherMailer.R
here() starts at /home/user/src
Error in abs_path(input) :
The file '/home/user/src/dataFetcher.Rmd' does not exist.
Calls: <Anonymous> -> setwd -> dirname -> abs_path

如果有人能帮助我理解并解决这个问题,那就太好了.如果存在另一种在没有 here() 的情况下设置 basepath 的可靠方法,我很想知道.最终从 Rstudio 执行脚本比理解如何从 commandline/cron 执行这些脚本重要得多.

If someone could help me understand and resolve this problem, that would be fantastic. If another reliable method to set basepath without here() exists, I would love to know. Ultimately executing script from Rstudio matters a lot less than understanding how to execute such scripts from commandline/cron.

我稍微修改了该函数,以便它可以返回文件的文件名或目录.我目前正在尝试修改它,以便在 .Rmd 文件从 Rstudio 编织并通过 R 文件同样运行时它可以工作.

I modified the function a little so that it could return either filename or directory for the file. I am currently trying to modify it so that it would work when .Rmd file is knitted from Rstudio and equally run via R file.

here2 <- function(type = 'dir') {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    filepath <- rstudioapi::getActiveDocumentContext()$path
  } else if ("interactive" %in% args) {
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else if ("--slave" %in% args) {
    string <- args[6]
    mBtwSquotes <- "(?<=')[^']*[^']*(?=')"
    filepath <- regmatches(string,regexpr(mBtwSquotes,string,perl = T))
  } else if (pmatch("--file=" ,args)) {
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else {
    if (type == 'dir') {
      filepath <- '.'
      return(filepath)
    } else {
      filepath <- "error"
      return(filepath)
    }
  }
  if (type == 'dir') {
    filepath <- dirname(filepath)
  }
  return(filepath)
}

然而,我发现 commandArgs() 是从 R 脚本继承的,即当 .Rmd 文档是从 脚本编织的时,它们对于 .Rmd 文档保持不变.R.因此,只有来自 script.R 位置的 basepath 可以通用,而不是文件名.换句话说,当这个函数放在 .Rmd 文件中时,它会指向调用的 script.R 路径,而不是 .Rmd 文件路径.

I discovered however that commandArgs() are inherited from the R script i.e. they remain the same for the .Rmd document when it is knit from a script.R. Therefore only the basepath from script.R location can be used universally, not file name. In other words this function when placed in a .Rmd file will point towards the calling script.R path not the .Rmd file path.

因此,此函数的较短版本将更有用:

The shorter version of this function will therefore be more useful:

here2 <- function() {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    # R script called from Rstudio with "source file button"
    filepath <- rstudioapi::getActiveDocumentContext()$path
  } else if ("--slave" %in% args) {
    # Rmd file called from Rstudio with "knit button"
    # (if we placed this function in a .Rmd file)
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=')[^']*[^']*(?=')"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else if ((sum(grepl("--file=" ,args))) >0) {
    # called in some other way that passes --file= argument
    # R script called via cron or commandline using Rscript
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else if (sum(grepl("rmarkdown::render" ,args)) >0 ) {
    # Rmd file called to render from commandline with
    # Rscript -e 'rmarkdown::render("RmdFileName")'
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=")[^"]*[^"]*(?=")"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else {
    # we do not know what is happening; taking a chance; could have  error later
    filepath <- normalizePath(".")
    return(filepath)
  }
  filepath <- dirname(filepath)
  return(filepath)
}

注意:.Rmd 文件中到达文件的包含目录,调用 normalizePath(".") - 无论您是从脚本、命令行还是从 Rstudio 调用 .Rmd 文件,它都可以工作.

NB: from within .Rmd file to get to the containing directory of the file it is enough to call normalizePath(".") - which works whether you call the .Rmd file from a script, commandline or from Rstudio.

推荐答案

你的要求

我认为 here() 的行为并不是你真正想要的.相反,您要查找的是确定源文件(也称为 .R 文件)的路径.我对 here() 命令进行了一些扩展,以符合您的预期:

what you asked for

The behaviour of here() isn't really what you want here, I think. Instead, what you are looking for is to determine the path of the source file aka the .R file. I extended the here() command a little to behave the way you expect:

here2 <- function() {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    dirname(rstudioapi::getActiveDocumentContext()$path)
  } else {
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
    dirname(filepath)
  }
}

脚本未在 RStudio 中运行的情况的想法来自 this answer.我通过在 dataFetcherMailer.R 文件的开头粘贴函数定义来尝试这个.您还可以考虑将其放在主目录中的另一个文件中,并使用例如 source("here2.R") 而不是 library(here) 来调用它或者你可以为此编写一个小的 R 包.

The idea for the case when the script is not run in RStudio comes from this answer. I tried this by pasting the function definition at the beginning of your dataFetcherMailer.R file. You could also think about putting this in another file in your home directory and call it with, e.g., source("here2.R") instead of library(here) or you could write a small R package for this purpose.

here2 <- function() {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    # R script called from Rstudio with "source file button"
    filepath <- rstudioapi::getActiveDocumentContext()$path
  } else if ("--slave" %in% args) {
    # Rmd file called from Rstudio with "knit button"
    # (if we placed this function in a .Rmd file)
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=')[^']*[^']*(?=')"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else if ((sum(grepl("--file=" ,args))) >0) {
    # called in some other way that passes --file= argument
    # R script called via cron or commandline using Rscript
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else if (sum(grepl("rmarkdown::render" ,args)) >0 ) {
    # Rmd file called to render from commandline with
    # Rscript -e 'rmarkdown::render("RmdFileName")'
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=")[^"]*[^"]*(?=")"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else {
    # we do not know what is happening; taking a chance; could have  error later
    filepath <- normalizePath(".")
    return(filepath)
  }
  filepath <- dirname(filepath)
  return(filepath)
}

我认为大多数人真正需要的

我不久前发现了这种方式,但后来实际上完全改变了我的工作流程,只使用 R Markdown 文件(和 RStudio 项目).这样做的好处之一是 Rmd 文件的工作目录始终是文件所在的位置.因此,无需费心设置工作目录,您只需在脚本中写入相对于 Rmd 文件位置的所有路径即可.

what I think most people actually need

I found this way a while ago but then actually changed my workflow entirely to only use R Markdown files (and RStudio projects). One of the advantages of this is that the working directory of Rmd files is always the location of the file. So instead of bothering with setting a working directory, you can just write all paths in your script relative to the Rmd file location.

---
title: "Data collection control report"
author: "HAL"
date: "`r Sys.Date()`"
output: pdf_document
---

```{r setup, include=FALSE}
library(knitr)

# in actual program this reads data from a changing online data source
df.main <- mtcars

# data backup
datestamp <- format(Sys.time(),format="%Y-%m-%d_%H-%M")

# create bkp folder if it doesn't exist
if (!dir.exists(paste0("./bkp/"))) dir.create("./bkp/")

backupName <- paste0("./bkp/dataBackup_", datestamp, "csv.gz")
write.csv(df.main, gzfile(backupName))
```

# This is data collection report

Yesterday's data total records: `r nrow(df.main)`.

The current directory is `r getwd()`

注意以./开头的路径表示从Rmd文件所在的文件夹开始.../ 意味着你上一级.../../ 你向上两级,依此类推.因此,如果您的 Rmd 文件位于名为scripts"的文件夹中,在您的根文件夹中,并且您想将数据保存在名为data"的文件夹中在您的根文件夹中,您编写 saveRDS(data, "../data/dat.RDS").

Note that paths starting with ./ mean to start in the folder of the Rmd file. ../ means you go one level up. ../../ you go two levels up and so on. So if your Rmd file is in a folder called "scripts" in your root folder, and you want to save your data in a folder called "data" in your root folder, you write saveRDS(data, "../data/dat.RDS").

您可以使用 Rscript -e 'rmarkdown::render("/home/johannes/Desktop/myGoodScripts/dataFetcher.Rmd")' 从命令行/cron 运行 Rmd 文件.

You can run the Rmd file from command line/cron with Rscript -e 'rmarkdown::render("/home/johannes/Desktop/myGoodScripts/dataFetcher.Rmd")'.

这篇关于R 脚本中的 here() 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-03 23:56
查看更多