r - 编织与交互式R行为

在我注意到那是knitr的作者建议的获取更多帮助的方法之后，我在这里重新张贴my problem。

我对.Rmd文件有点困惑，我可以在交互式R session 中逐行进行处理，也可以对R CMD BATCH进行处理，但是使用knit("test.Rmd")时失败。我不确定问题出在哪里，因此我尽了最大努力缩小了问题的范围。这是示例(在test.Rmd中):

```{r Rinit, include = FALSE, cache = FALSE}
opts_knit$set(stop_on_error = 2L)
library(adehabitatLT)
```

The functions to be used later:

```{r functions}
ld <- function(ltraj) {
    if (!inherits(ltraj, "ltraj"))
        stop("ltraj should be of class ltraj")
    inf <- infolocs(ltraj)
    df <- data.frame(
        x = unlist(lapply(ltraj, function(x) x$x)),
        y = unlist(lapply(ltraj, function(x) x$y)),
        date = unlist(lapply(ltraj, function(x) x$date)),
        dx = unlist(lapply(ltraj, function(x) x$dx)),
        dy = unlist(lapply(ltraj, function(x) x$dy)),
        dist = unlist(lapply(ltraj, function(x) x$dist)),
        dt = unlist(lapply(ltraj, function(x) x$dt)),
        R2n = unlist(lapply(ltraj, function(x) x$R2n)),
        abs.angle = unlist(lapply(ltraj, function(x) x$abs.angle)),
        rel.angle = unlist(lapply(ltraj, function(x) x$rel.angle)),
        id = rep(id(ltraj), sapply(ltraj, nrow)),
        burst = rep(burst(ltraj), sapply(ltraj, nrow)))
    class(df$date) <- c("POSIXct", "POSIXt")
    attr(df$date, "tzone") <- attr(ltraj[[1]]$date, "tzone")
    if (!is.null(inf)) {
        nc <- ncol(inf[[1]])
        infdf <- as.data.frame(matrix(nrow = nrow(df), ncol = nc))
        names(infdf) <- names(inf[[1]])
        for (i in 1:nc) infdf[[i]] <- unlist(lapply(inf, function(x) x[[i]]))
        df <- cbind(df, infdf)
    }
    return(df)
}
ltraj2sldf <- function(ltr, proj4string = CRS(as.character(NA))) {
    if (!inherits(ltr, "ltraj"))
        stop("ltr should be of class ltraj")
    df <- ld(ltr)
    df <- subset(df, !is.na(dist))
    coords <- data.frame(df[, c("x", "y", "dx", "dy")], id = as.numeric(row.names(df)))
    res <- apply(coords, 1, function(dfi) Lines(Line(matrix(c(dfi["x"],
        dfi["y"], dfi["x"] + dfi["dx"], dfi["y"] + dfi["dy"]),
        ncol = 2, byrow = TRUE)), ID = format(dfi["id"], scientific = FALSE)))
    res <- SpatialLinesDataFrame(SpatialLines(res, proj4string = proj4string),
        data = df)
    return(res)
}
```

I load the object and apply the `ltraj2sldf` function:

```{r fail}
load("tr.RData")
juvStp <- ltraj2sldf(trajjuv, proj4string = CRS("+init=epsg:32617"))
dim(juvStp)
```

使用knitr("test.Rmd")失败:

label: fail
Quitting from lines 66-75 (test.Rmd)
Error in SpatialLinesDataFrame(SpatialLines(res, proj4string =
proj4string),  (from     <text>#32) :
  row.names of data and Lines IDs do not match

发生错误后，直接在R控制台中使用调用可以按预期方式工作...

问题与format产生ID的方式有关(在apply的ltraj2sldf调用中)，就在ID 100,000之前:使用交互式调用，R给出“99994”，“99995”，“99996”，“99997”，“99998 “，” 99999“，” 100000“；使用编织器R可以给“99994”，“99995”，“99996”，“99997”，“99998”，“99999”，“100000”以及其他前导空格。

是否有任何原因会发生这种现象？为什么knitr的行为与R中的直接调用不同？我必须承认我在使用它时遇到了麻烦，因为我无法调试它(它在交互式 session 中有效)!

任何提示将不胜感激。我可以提供.RData(如果有帮助的话)(文件为4.5 Mo)，但是我最感兴趣的是为什么会出现这种差异。我尝试没有成功地提出一个可自我复制的示例，对此感到抱歉。在此先感谢您的贡献!

在评论洗礼后，这里是有关ID生成的更多详细信息。基本上，ID是通过apply调用在数据帧的每一行生成的，该调用又使用了format这样的:format(dfi["id"], scientific = FALSE)。在这里，列id只是一个从1到行数(1:nrow(df))的序列。 scientific = FALSE只是为了确保没有100000的结果，例如1e + 05。

根据对ID生成的探讨，仅对于第一个消息中显示的ID(即99995到99999)中出现的ID出现了问题，并为其添加了前导空格。调用format不会发生这种情况，因为我没有在输出中要求输入特定位数。例如:

> format(99994:99999, scientific = FALSE)
[1] "99994" "99995" "99996" "99997" "99998" "99999"

但是，如果ID是按块生成的，则可能会发生以下情况:

> format(99994:100000, scientific = FALSE)
[1] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"

请注意，一次处理相同的对象可获得预期的结果:

> for (i in 99994:100000) print(format(i, scientific = FALSE))
[1] "99994"
[1] "99995"
[1] "99996"
[1] "99997"
[1] "99998"
[1] "99999"
[1] "100000"

最后，这就像一次不准备一个ID(就像我期望通过行的apply调用那样)一样，但在这种情况下，一次只能准备6个，并且仅在接近1e + 05时才准备。当然，仅当使用编织器时，才不是交互式或批处理R。

这是我的 session 信息:

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] knitr_1.2           adehabitatLT_0.3.12 CircStats_0.2-4
[4] boot_1.3-9          MASS_7.3-27         adehabitatMA_0.3.6
[7] ade4_1.5-2          sp_1.0-11           basr_0.5.3

loaded via a namespace (and not attached):
[1] digest_0.6.3    evaluate_0.4.4  formatR_0.8     fortunes_1.5-0
[5] grid_3.0.1      lattice_0.20-15 stringr_0.6.2   tools_3.0.1

最佳答案

杰夫(Jeff)和巴蒂斯特(baptiste)确实是对的!这是一个选项问题，与digits参数有关。我设法提出了一个可行的最小示例(例如test.Rmd):

Simple reproducible example : df1 is a data frame of 110,000 rows,
with 2 random normal variables + an `id` variable which is a series
from 1 to the number of row.

```{r example}
df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
```

From this, we create a `id2` variable using `format` and `scientific =
FALSE` to have results with all numbers instead of scientific
notations (e.g. 100,000 instead of 1e+05):

```{r example-continued}
df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
df1$id2[99990:100010]
```

使用R可以按预期方式交互工作，从而导致:

 [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
 [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

但是，使用knit的结果却大不相同:

> library(knitr)
> knit("test.Rmd")

[...]

##  [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
##  [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
## [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

请注意99994之后的其他前导空格。差异实际上来自Jefft正确建议的digits选项:R默认情况下使用7，而knitr使用4。这种差异影响format的输出，尽管我不太了解这是什么。继续在这里。 R风格:

> options(digits = 7)
> format(99999, scientific = FALSE)
[1] "99999"

针织款式:

> options(digits = 4)
> format(99999, scientific = FALSE)
[1] " 99999"

但这会影响所有数字，不仅会影响到99994之后(嗯，老实说，我什至不明白为什么它会添加前导空格)

> options(digits = 4)
> format(c(1:10, 99990:100000), scientific = FALSE)
 [1] "     1" "     2" "     3" "     4" "     5" "     6" "     7"
 [8] "     8" "     9" "    10" " 99990" " 99991" " 99992" " 99993"
[15] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"

由此，我不知道哪个是错误的:knitr，apply或format？至少，我想出了一种解决方法，在trim = TRUE中使用了format参数。它不能解决问题的原因，但是确实消除了结果中的前导空格...

关于r - 编织与交互式R行为，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/17866230/