问题描述
我想在数据帧的变量中执行线性插值,其中考虑到:1) 两点之间的时间差,2) 获取数据的时刻以及 3) 被测量的个体变量.
例如在下一个数据帧中:
df
我想获得:
结果
我不能只使用 zoo
包的函数 na.approx
因为所有的观察都不是连续的,有些观察属于一个个体,其他观察属于其他个体.原因是因为如果第二个人对 NA
进行第一次观察,而我将专门使用函数 na.approx
,我将使用来自 的信息individual==1
插入individual==2
的NA
(例如下一个数据帧会有这样的错误)
df_2
我已经尝试使用包 zoo
和 dplyr
:
库(dplyr)图书馆(动物园)证明 <-df%>%group_by(个人)%>%na.approx(df$价值)
但我无法在 zoo
对象中执行 group_by
.
您知道如何按组在一个变量中插入 NA
值吗?
提前致谢,
使用 data.frame
而不是 cbind
来创建数据.cbind
返回一个矩阵,但您需要 dplyr
的数据框.然后在 mutate
中使用 na.approx
.我已经注释掉了 group_by
,因为您没有在数据中提供分组变量,但是一旦您将分组变量添加到数据框中,该方法应该可以工作.
df %变异(ValueInterp = na.approx(Value, na.rm=FALSE))
time Individuals Value ValueInterp1 1 1 不适用 不适用2 2 1 2 23 3 1 3 34 4 1 不适用 45 5 1 5 56 6 1 不适用 67 7 1 7 78 1 2 8 89 2 2 不适用 910 3 2 10 10
更新:要插入多个列,我们可以使用 mutate_at
.这是一个包含两个值列的示例.我们使用 mutate_at
在列名称中包含 "Value"
的所有列上运行 na.approx
.list(interp=na.approx)
告诉 mutate_at
通过运行 na.approx
并添加 interp
来生成新的列名> 作为生成新列名的后缀:
df %mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)
time Individuals Value1 Value2 Value1_interp Value2_interp<dbl><dbl><dbl><dbl><dbl><dbl>1 1 1 NA NA NA NA2 2 1 2 4 2 43 3 1 3 6 3 64 4 1 不适用 不适用 4 85 5 1 5 10 5 106 6 1 不适用 不适用 6 127 7 1 7 14 7 148 1 2 8 16 8 169 2 2 不适用 不适用 9 1810 3 2 10 20 10 20
如果您不想保留原始的、未插值的列,您可以这样做:
df %>%group_by(个人)%>%mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)
I would like to perform a linear interpolation in a variable of a data frame which takes into account the: 1) time difference between the two points, 2) the moment when the data was taken and 3) the individual taken for measure the variable.
For example in the next dataframe:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))
df
I would like to obtain:
result <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))
result
I cannot use exclusively the function na.approx
of the package zoo
because all observations are not consecutives, some observations belong to one individual and other observations belong to other ones. The reason is because if the second individual would have its first obsrevation with NA
and I would use exclusively the function na.approx
, I would be using information from the individual==1
to interpolate the NA
of the individual==2
(e.g the next data frame would have sucherror)
df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))
df_2
I have tried using the packages zoo
and dplyr
:
library(dplyr)
library(zoo)
proof <- df %>%
group_by(Individuals) %>%
na.approx(df$Value)
But I cannot perform group_by
in a zoo
object.
Do you know how to interpolate NA
values in one variable by groups?
Thanks in advance,
Use data.frame
, rather than cbind
to create your data. cbind
returns a matrix, but you need a data frame for dplyr
. Then use na.approx
inside mutate
. I've commented out group_by
, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))
library(dplyr)
library(zoo)
df %>%
group_by(Individuals) %>%
mutate(ValueInterp = na.approx(Value, na.rm=FALSE))
Update: To interpolate multiple columns, we can use mutate_at
. Here's an example with two value columns. We use mutate_at
to run na.approx
on all columns that include "Value"
in the column name. list(interp=na.approx)
tells mutate_at
to generate new column names by running na.approx
and adding interp
as a suffix to generate the new column names:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)
If you don't want to preserve the original, uninterpolated columns, you can do:
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)
这篇关于R:按组插入 NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!