


I have a data set that has a list of IDs, year, and income. I am trying to interpolate the yearly values to quarterly values.

id = c(2, 2, 2, 3, 3, 3,4,4,4,5,5)
year = c(2000, 2001, 2002, 2000,2001,2002, 2000,2001,2002,2000,2002)
income = c(20, 24, 26, 30,34,36, 40,46,48,53,56)
df = data.frame(id, year, income)


For e.g., I am looking to get the values of (interpolated) income for year-quarter 2000Q1, 2000Q2, 2000Q3, 2000Q4, 2001Q1, ... , 2001Q4. So the dataframe would be id,year-quarter, income. The income would be based on interpolated income.


I realize when linear interpolating, the trend must only be based on the respective IDs. Any suggestions on how I would do the interpolation in R?


这里是一个使用 dplyr

Here's an example using dplyr:


annual_data <- data.frame(
    person=c(1, 1, 1, 2, 2),
    year=c(2010, 2011, 2012, 2010, 2012),
    y=c(1, 2, 3, 1, 3)

expand_data <- function(x) {
    years <- min(x$year):max(x$year)
    quarters <- 1:4
    grid <- expand.grid(quarter=quarters, year=years)
    x$quarter <- 1
    merged <- grid %>% left_join(x, by=c('year', 'quarter'))
    merged$person <- x$person[1]

interpolate_data <- function(data) {
    xout <- 1:nrow(data)
    y <- data$y
    interpolation <- approx(x=xout[!is.na(y)], y=y[!is.na(y)], xout=xout)
    data$yhat <- interpolation$y

expand_and_interpolate <- function(x) interpolate_data(expand_data(x))

quarterly_data <- annual_data %>% group_by(person) %>% do(expand_and_interpolate(.))



   quarter year person  y yhat
1        1 2010      1  1 1.00
2        2 2010      1 NA 1.25
3        3 2010      1 NA 1.50
4        4 2010      1 NA 1.75
5        1 2011      1  2 2.00
6        2 2011      1 NA 2.25
7        3 2011      1 NA 2.50
8        4 2011      1 NA 2.75
9        1 2012      1  3 3.00
10       2 2012      1 NA   NA
11       3 2012      1 NA   NA
12       4 2012      1 NA   NA
13       1 2010      2  1 1.00
14       2 2010      2 NA 1.25
15       3 2010      2 NA 1.50
16       4 2010      2 NA 1.75
17       1 2011      2 NA 2.00
18       2 2011      2 NA 2.25
19       3 2011      2 NA 2.50
20       4 2011      2 NA 2.75
21       1 2012      2  3 3.00
22       2 2012      2 NA   NA
23       3 2012      2 NA   NA
24       4 2012      2 NA   NA

可能有很多方法来清理它。正在使用的主要功能是 expand.grid dplyr :: group_by 功能有点棘手。看看 zoo :: na.approx.default 的实现对于了解如何使用

There are probably a bunch of ways to clean this up. The key functions being used are expand.grid, approx, and dplyr::group_by. The approx function is a little tricky. Looking at the implementation of zoo::na.approx.default was quite helpful in figuring out how to work with approx.


10-16 03:19