问题描述
对于数据框中的给定列,我想构造一个新矢量,该矢量的每个点都由任一侧的点的平均值组成.但是,对于最后一次观察,它将倒数第二次.对于第一个观察,它将是第二个.我编写了此R代码来解决此问题,但是我反复调用它,它非常慢.有人可以提供一些有关如何更有效地进行操作的提示吗?谢谢.
For a given column in a dataframe, I want to construct a new vector which for each point consists of the average of the points on either side. However for the last observation it will instead be the second to last. And for the first observation it will be second. I wrote this R code to solve the issue, however I am calling it repeatedly and it is extremely slow. Can someone give some tips on how to do it more efficiently? Thanks.
x1 <- c(rep('a',100),rep('b',100),rep('c',100))
x2 <- rnorm(300)
x <- data.frame(x1,x2)
names(x) <- c('col1','data1')
a.linear.interpolation <- function(x) {
require(zoo)
require(data.table)
a.dattab <- data.table(x)
setkey(a.dattab,col1)
#replace any NA values using LOCF / NOCB
a.dattab[,data1:=na.locf(data1,na.rm=FALSE),by=list(col1)]
a.dattab[,data1:=na.locf(data1,na.rm=FALSE,fromLast=TRUE),by=list(col1)]
#Adding a within group sequence number and a size of group field to facilitate
#row by row processing
a.dattab[,grpseq:=seq_len(.N),by=list(col1)]
a.dattab[,grpseq_max:=.N,by=list(col1)]
#convert back to data.frame
#data.frame seems faster than data.table for this row by row type processing
a.df <- data.frame(a.dattab)
new.col <- vector(length=nrow(a.df))
for(i in seq(nrow(a.df))){
if(a.df[i,"grpseq"]==1){
new.col[i] <- a.df[i+1,"data1"]
}
else if(a.df[i,"grpseq"]==a.df[i,"grpseq_max"]){
new.col[i] <- a.df[i-1,"data1"]
}
else {
new.col[i] <- (a.df[i-1,"data1"]+a.df[i+1,"data1"])/2
}
}
return(new.col)
}
推荐答案
除了使用rollmeans
,基本R filter
函数也可以执行此类操作.例如:
Apart from using rollmeans
, the base R filter
function can do this sort of thing as well. E.g.:
linint <- function(vec) {
c(vec[2], filter(vec, c(0.5, 0, 0.5))[-c(1, length(vec))], vec[length(vec) - 1])
}
x <- c(1,3,6,10,1)
linint(x)
#[1] 3.0 3.5 6.5 3.5 10.0
很快,只需不到一秒钟的时间就可以咀嚼一千万个案例:
And it's pretty quick, chewing through 10M cases in less than a second:
x <- rnorm(1e7)
system.time(linint(x))
#user system elapsed
#0.57 0.18 0.75
这篇关于如何提高线性插值的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!