问题描述
在 data.frame 上使用 apply
时,参数会(隐式)转换为字符.一个例子:
When using apply
on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
但是:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
有什么办法可以避免这种转换?还是我总是必须通过 as.POSIXlt(y["t2"])
转换回来?
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])
?
编辑
我的 df 有 2 个时间戳(比如 t2 和 t3)和其他一些字段(比如 v1、v2).对于给定 t2 的每一行,我想找到 k(例如 3)行 t3 最接近但低于 t2(和相同的 v1),并从这些行返回 v2 的统计数据(例如平均值).我写了一个函数 f(t2, v1, df) 并且只想使用 apply(df, 1, function(x) f(y["t2"], y["v1"], df)
.在 R 中有没有更好的方法来做这样的事情?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df)
. Is there any better way to do such things in R?
推荐答案
让我们把多条评论总结成一个解释.
Let's wrap up multiple comments into an explanation.
- 使用
apply
将data.frame
转换为matrix
.这个意味着将使用限制最少的类.至少在这种情况下限制是字符. - 您正在向
apply
的MARGIN
参数提供1
.这适用逐行让你更糟,因为你真的在混课现在在一起.在这种情况下,您使用的是专为矩阵设计的apply
和向量上的 data.frames.这不是完成这项工作的正确工具. - 在这种情况下,我会使用
lapply
或sapply
作为 rmk 指出来获取单个 t2 列如下所示:
- the use of
apply
converts adata.frame
to amatrix
. Thismeans that the least restrictive class will be used. The leastrestrictive in this case is character. - You're supplying
1
toapply
'sMARGIN
argument. This applies by row and makes you even worse off as you're really mixing classes together now. In this scenario you're usingapply
designed for matrices and data.frames on a vector. This is not the right tool for the job. - In ths case I'd use
lapply
orsapply
as rmk points out to grab the classes ofthe single t2 column as seen below:
代码:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
通常,您选择适合工作的 apply
系列.我个人经常使用 lapply
或 for
循环来处理特定列或使用索引([, ]
)对我想要的列进行子集,然后继续apply
.这个问题的答案实际上归结为确定你想要完成什么,询问是apply
最合适的工具,然后从那里开始.
In general you choose the apply
family that fits the job. Often I personally use lapply
or a for
loop to act on specific columns or subset the columns I want using indexing ([, ]
) and then proceed with apply
. The answer to this problem really boils down to determining what you want to accomplish, asking is apply
the most appropriate tool, and proceed from there.
我可以提供这个 博客发布,作为关于不同apply
函数系列的作用的优秀教程.
May I offer this blog post as an excellent tutorial on what the different apply
family of functions do.
这篇关于在数据帧上使用应用时如何避免隐式字符转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!