问题描述
我有一个面板数据集:panel和一个带有事件列表的数据集:Events.对于面板数据集,相等的panelID表示两个观察值属于同一个.
I have a panel dataset: panel and a dataset with a list of events: Events. For the panel dataset, an equal panelID shows that two observations belong together.
panelID = c(1:50)
year= c(2001:2010)
country = c("NLD", "GRC", "GBR")
n <- 2
library(data.table)
set.seed(123)
Panel <- data.table(panelID = rep(sample(panelID), each = n),
country = rep(sample(country, length(panelID), replace = T), each = n),
year = c(replicate(length(panelID), sample(year, n))),
some_NA = sample(0:5, 6),
some_NA_factor = sample(0:5, 6),
norm = round(runif(100)/10,2),
Income = round(rnorm(10,-5,5),2),
Happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
Educ = round(rnorm(10,0.75,0.3),2))
Panel[, uniqueID := .I] # Creates a unique ID
Panel[Panel == 0] <- NA
Events <- fread(
"Event_Type country year
A NLD 2005
A NLD 2004
A GBR 2006
A GBR 2003
A GRC 2002
A GRC 2007",
header = TRUE)
我想知道每年两次分解出的面板观测结果之间发生Events
的频率.例如,对于使用panelID == 2
进行的小组观察,在国家NLD中,在该小组观察的years
之中或之间有两个事件,即2004年和2005年.因此:
I want to know how often Events
happen between the panel obervations, split out per year. As an example, for the panel observations with panelID == 2
, in the country NLD there are two events, in or in between the years
of that panel observation, namely in 2004 and 2005. Hence:
期望的输出:
panleID country year 2002 2003 2004 2005 2006 2007
2 NLD 2004 NA NA 1 1 NA NA
2 NLD 2007 NA NA 1 1 NA NA
基于 Cole 我尝试执行以下操作:
Based on the solution by Cole I tried to do the following:
# cast Event
Events_cast <- reshape2::dcast(Events, country + year ~ year, length, value.var="year")
# update by reference for join later
Panel[, `:=`(start = min(year), end = max(year)), by = panelID]
# dcast sorts the rhs alphabetically
cols <- sort(unique(Events[['year']]))
# non-equi update join
Panel[Events_cast,
on = .(country,
start <= year,
end >= year),
(cols) := mget(cols)]
#clean up data frame
setnafill(Panel, fill = 0L, cols = cols)
Panel[, `:=`(start = NULL, end = NULL)]
Panel
但是在# non-equi update join
处出现错误:Error in [.data.table (Panel, Events, on = .(country, : LHS of := appears to be column positions but are outside [1,ncol] range. New columns can only be added by name.
推荐答案
data.table 尝试使用年份来确定您要选择的列.错误告诉您2006
,并且其他年份不是有效的列号.解决方法很简单:
data.table is trying to use the years to figure out which column you are selecting. The error is telling you 2006
and other years are not valid column numbers. The fix is easy:
cols <- as.character(sort(unique(Events[['year']])))
这是所有内容以及其他一些更改,包括:
And here's everything together with a few other changes including:
- 使用
data.table::dcast
代替reshape2::dcast
- 在
Events
数据表中添加start
和end
并使用这些列进行转换.
- Using
data.table::dcast
instead ofreshape2::dcast
- Adding
start
andend
to theEvents
data.table and casting with those columns.
# cast Event
# Events_cast <- reshape2::dcast(Events, country + year ~ year, length, value.var="year")
Events[, `:=`(start = min(year), end = max(year)), by = country]
Events_cast <- dcast(Events, country + start + end~ year, length)
# update by reference for join later
Panel[, `:=`(start = min(year), end = max(year)), by = panelID]
# dcast sorts the rhs alphabetically
cols <- as.character(sort(unique(Events[['year']])))
# non-equi update join
# Panel[Events_cast,
# on = .(country,
# start <= year,
# end >= year),
# (cols) := mget(cols)]
Panel[Events_cast,
on = .(country,
start <= start,
end >= end),
(cols) := mget(cols)]
#clean up data frame
setnafill(Panel, fill = 0L, cols = cols)
Panel[, `:=`(start = NULL, end = NULL)]
Panel
这篇关于根据一列进行广播/合并,其值在一定范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!