本文介绍了根据条件合并一列匹配的两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
模拟数据:
set.seed(1)
df1 <- data.frame(country=c("US", "UK"),
year=c(2000, 2003))
df2 <- data.frame(country=rep(c("US", "UK"), 10),
year=rep(2000:2009, 2),
myvar=rnorm(20))
df1
包含感兴趣的国家年份。我想要获得这个国家年份和3年前后的 myvar
的价值。
df1
contains the country-year of interest. I want to get the myvar
values for this country-year AND 3 years before and after.
换句话说,合并是根据 df2 $ country == df1 $ country
AND df2 $ year> df1 $ year - 3& df2 $ year< df1 $ year + 3
In other words, the merging is done based on condition that df2$country==df1$country
AND df2$year > df1$year - 3 & df2$year < df1$year + 3
编辑:我的(工作,不优雅)解决方案是填充 df1
创建我感兴趣的所有国家年份,然后以 df2
以常规方式合并。
My (working, not elegant) solution is to pad df1
to create all the country-years that I'm interested in, then merge with df2
the regular way.
library(plyr)
ddply(df1, c("country", "year"),
function(df) data.frame(rep(df$country, 7), (df$year-3):(df$year+3)))
生成
country year rep.df.country..7. X.df.year...3...df.year...3.
1 UK 2003 UK 2000
2 UK 2003 UK 2001
3 UK 2003 UK 2002
4 UK 2003 UK 2003
5 UK 2003 UK 2004
6 UK 2003 UK 2005
7 UK 2003 UK 2006
8 US 2000 US 1997
9 US 2000 US 1998
10 US 2000 US 1999
11 US 2000 US 2000
12 US 2000 US 2001
13 US 2000 US 2002
14 US 2000 US 2003
推荐答案
在data.table中使用foverlaps的试用
A trial using foverlaps in data.table
set.seed(1)
df1 <- data.frame(country=c("US", "UK"),
year=c(2000, 2003, 2009, 2009))
df2 <- data.frame(country=rep(c("US", "UK"), 10),
year=rep(2000:2009, 2),
myvar=rnorm(20))
library(data.table)
setDT(df1); setDT(df2) # convert to data table
df1[, c("start", "end") := list(year-2, year+2)]
setkey(df1, country, start, end)
setkey(df2[, year2:=year], country, year, year2)
foverlaps(df1, df2, type="any")[,4:7:=NULL][]
country year myvar
1: UK 2001 0.18364332
2: UK 2001 0.38984324
3: UK 2003 1.59528080
4: UK 2003 -2.21469989
5: UK 2005 -0.82046838
6: UK 2005 -0.04493361
7: UK 2007 0.73832471
8: UK 2007 0.94383621
9: UK 2009 -0.30538839
10: UK 2009 0.59390132
11: US 2000 -0.62645381
12: US 2000 1.51178117
13: US 2002 -0.83562861
14: US 2002 -0.62124058
15: US 2008 0.57578135
16: US 2008 0.82122120
这篇关于根据条件合并一列匹配的两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!