问题描述
我有一个数据集,其中包含ID,年份和收入列表。我试图将年度价值插值到季度价值。
I have a data set that has a list of IDs, year, and income. I am trying to interpolate the yearly values to quarterly values.
id = c(2, 2, 2, 3, 3, 3,4,4,4,5,5)
year = c(2000, 2001, 2002, 2000,2001,2002, 2000,2001,2002,2000,2002)
income = c(20, 24, 26, 30,34,36, 40,46,48,53,56)
df = data.frame(id, year, income)
例如,我正在寻求2000Q1,2000Q2,2000Q3,2000Q4,2001Q1,...,2001Q4的季度收入(插值)收入。因此,数据帧将是ID,季度,收入。收入将以内插收入为基础。
For e.g., I am looking to get the values of (interpolated) income for year-quarter 2000Q1, 2000Q2, 2000Q3, 2000Q4, 2001Q1, ... , 2001Q4. So the dataframe would be id,year-quarter, income. The income would be based on interpolated income.
我意识到线性插值时,趋势只能基于相应的ID。关于我如何在R中进行插值的任何建议?
I realize when linear interpolating, the trend must only be based on the respective IDs. Any suggestions on how I would do the interpolation in R?
推荐答案
这里是一个使用 dplyr
:
Here's an example using dplyr
:
library(dplyr)
annual_data <- data.frame(
person=c(1, 1, 1, 2, 2),
year=c(2010, 2011, 2012, 2010, 2012),
y=c(1, 2, 3, 1, 3)
)
expand_data <- function(x) {
years <- min(x$year):max(x$year)
quarters <- 1:4
grid <- expand.grid(quarter=quarters, year=years)
x$quarter <- 1
merged <- grid %>% left_join(x, by=c('year', 'quarter'))
merged$person <- x$person[1]
return(merged)
}
interpolate_data <- function(data) {
xout <- 1:nrow(data)
y <- data$y
interpolation <- approx(x=xout[!is.na(y)], y=y[!is.na(y)], xout=xout)
data$yhat <- interpolation$y
return(data)
}
expand_and_interpolate <- function(x) interpolate_data(expand_data(x))
quarterly_data <- annual_data %>% group_by(person) %>% do(expand_and_interpolate(.))
print(as.data.frame(quarterly_data))
此方法的输出是:
quarter year person y yhat
1 1 2010 1 1 1.00
2 2 2010 1 NA 1.25
3 3 2010 1 NA 1.50
4 4 2010 1 NA 1.75
5 1 2011 1 2 2.00
6 2 2011 1 NA 2.25
7 3 2011 1 NA 2.50
8 4 2011 1 NA 2.75
9 1 2012 1 3 3.00
10 2 2012 1 NA NA
11 3 2012 1 NA NA
12 4 2012 1 NA NA
13 1 2010 2 1 1.00
14 2 2010 2 NA 1.25
15 3 2010 2 NA 1.50
16 4 2010 2 NA 1.75
17 1 2011 2 NA 2.00
18 2 2011 2 NA 2.25
19 3 2011 2 NA 2.50
20 4 2011 2 NA 2.75
21 1 2012 2 3 3.00
22 2 2012 2 NA NA
23 3 2012 2 NA NA
24 4 2012 2 NA NA
可能有很多方法来清理它。正在使用的主要功能是 expand.grid
,约
和 dplyr :: group_by
。 约
功能有点棘手。看看 zoo :: na.approx.default
的实现对于了解如何使用约
。
There are probably a bunch of ways to clean this up. The key functions being used are expand.grid
, approx
, and dplyr::group_by
. The approx
function is a little tricky. Looking at the implementation of zoo::na.approx.default
was quite helpful in figuring out how to work with approx
.
这篇关于在R年度时间序列数据中插入季度值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!