问题描述
我正在使用Synth()软件包(请参见 ftp://cran.r-project.org/pub/R/web/packages/Synth/Synth.pdf ).
I am using the Synth() package (see ftp://cran.r-project.org/pub/R/web/packages/Synth/Synth.pdf) in R.
这是我数据框的子集:
all_data_uk <- structure(list(countryno = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 16, 16,
16), country = c("Australia", "Australia", "Australia", "Canada",
"Canada", "Canada", "Denmark", "Denmark", "Denmark", "United Kingdom",
"United Kingdom", "United Kingdom"), year = c(1971, 1972, 1973,
1971, 1972, 1973, 1971, 1972, 1973, 1971, 1972, 1973), top10_income_share = c(0.2657,
0.2627, 0.2546, 0.37833, 0.37807, 0.37271, 0.323069660453, 0.322700285165,
0.320162826601, 0.2929, 0.289, 0.2831), top5_income_share = c(0.1655,
0.1654, 0.1593, 0.24075, 0.24106, 0.23917, 0.211599113574, 0.21160700537,
0.209096813051, 0.1881, 0.1848, 0.1818), top1_income_share = c(0.0557,
0.0573, 0.054, 0.08866, 0.08916, 0.08982, 0.082392548404, 0.0824267594074,
0.07776546085945, 0.0702, 0.0694, 0.0699), gdp_growth = structure(c(4.00330835508684,
3.91178191457604, 2.59931282534502, 4.11765761702448, 5.44585557970514,
6.96420291945871, 3.00503299618597, 3.92934382503836, 4.09292523611968,
3.48436803631409, 4.30194591910262, 6.50872079327365), label = "(annual %)", class = c("labelled",
"numeric")), capital_quinn = structure(c(50, 37.5, 37.5, 87.5, 87.5, 75, 75, 75, 75, 50, 50, 50), label = "(financial openness - capital account)", class = c("labelled",
"numeric"))), class = "data.frame", .Names = c("countryno", "country",
"year", "top10_income_share", "top5_income_share", "top1_income_share",
"gdp_growth", "capital_quinn"), row.names = c(NA, -12L))
在可重现的示例中,我要使用三个不同的结果变量"top10_income_share","top5_income_share","top1_income_share"(在我的实际问题中,我有更多的选择)."gdp_growth"和"capital_quinn"是我的控制变量.
In my reproducible example I have three different outcome variables "top10_income_share", "top5_income_share", "top1_income_share" (in my real problem I have way more) that I want to run the analysis with. "gdp_growth" and "capital_quinn" are my control variables.
对于一个结果变量,这里是"top10_income_share",我有以下代码(效果很好):
For one outcome variable, here "top10_income_share", I have the following code (which works fine):
# Define treated and control units
control_units_top10 <- c(1,2)
treated_unit <- 16
# Run dataprep() which returns a list of matrices
dataprep.out_top10 <- dataprep(
foo = all_data_uk,
predictors = c("gdp_growth", "capital_quinn"),
predictors.op = "mean",
time.predictors.prior = 1971:1972,
special.predictors = list(
list("top10_income_share", 1971, "mean"),
list("top10_income_share", 1972, "mean")),
dependent = "top10_income_share",
unit.variable = "countryno",
unit.names.variable = "country",
time.variable = "year",
treatment.identifier = treated_unit,
controls.identifier = control_units_top10,
time.optimize.ssr = 1971:1972,
time.plot = 1971:1973)
# Run synth() command
synth.out_top10 <- synth(data.prep.obj = dataprep.out_top10, optimxmethod = "BFGS")
# Annual discrepancies in the top 10 income share trend between unit 4 (United Kingdom) and its synthetic counterpart:
gaps_top10 <- dataprep.out_top10$Y1plot - (dataprep.out_top10$Y0plot %*% synth.out_top10$solution.w)
我想遍历这些命令并对所有三个结果变量进行相同的分析.我的问题是,每次必须调整 treatment.identifier
, special.predictors
和 dependent
时,此外,我想存储所有三个结果变量的输出(dataprep.out_top10,dataprep.out_top5 ...; synth.out_top10,synth.out_top5等).
I would like to loop over these commands and do the same analysis for all three outcome variables. My problem is, that each time I have to adjust treatment.identifier
, special.predictors
and dependent
. Furthermore I would like to store the outputs (dataprep.out_top10, dataprep.out_top5...; synth.out_top10, synth.out_top5... etc.) for all three outcome variables.
我发现了类似的问题(保存每个R在新列表中进行循环迭代),但是它们在每个循环中具有相同的结果和控制变量,只想循环遍历控制单元,而我没有成功将其解决方案应用于我的问题.
I found a similar question (Save every R for loop iteration in a new list), however they had the same outcome and control variables in each loop and only wanted to loop over the control units and I did not succeed in applying their solution to my problem.
以下是我到目前为止提出的内容:
In the following is what I came up with so far:
control_units_top10 <- c(1,2)
control_units_top5 <- c(1,2,3)
control_units_top1 <- c(1,3)
treated_unit <- 16
for(top in c("top10", "top5", "top1"))
{
paste0("dataprep.out_", top) <- dataprep(
foo = all_data_uk,
predictors = c("gdp_growth", "capital_quinn"),
predictors.op = "mean",
time.predictors.prior = 1971:1972,
special.predictors = list(
list(paste0(top, "_income_share"), 1971, "mean"),
list(paste0(top, "_income_share"), 1972, "mean")),
dependent = paste0(top, "_income_share"),
unit.variable = "countryno",
unit.names.variable = "country",
time.variable = "year",
treatment.identifier = treated_unit,
controls.identifier = get(paste0("control_units_", top)),
time.optimize.ssr = 1971:1972,
time.plot = 1971:1973)
paste0("synth.out_", top) <- synth(data.prep.obj = dataprep.out, optimxmethod = "BFGS")
paste0("gaps_", top) <- paste0("dataprep.out_", top)$Y1plot - (paste0("dataprep.out_", top)$Y0plot %*% paste0("synth.out_", top)$solution.w)
}
我收到错误消息: paste0("synth.out_",top)中的错误<-synth(data.prep.obj = dataprep.out ,:分配目标扩展为非语言对象
,所以我想我的paste0()方法不起作用,但是我找不到如何对结果变量和新对象进行索引"的其他解决方案.
I get the error: Error in paste0("synth.out_", top) <- synth(data.prep.obj = dataprep.out, : target of assignment expands to non-language object
, so I guess my paste0() approach does not work, but I could not find any other solution on how to "index" the outcome variables and my new objects.
我对R和股票溢出并不陌生,对于如何建立循环的小窍门,我将感到非常高兴.
I am new to R and stockoverflow and would be very happy about any tipps on how to set up the loop.
提前谢谢!
推荐答案
我想我明白了.
它有点长,但这主要是因为我选择按照与您所做的类似的方式进行整理.当然,可以将内容压缩并在一个循环中运行整个内容.
I think I've got it.
It's a bit long, but that's mainly because I chose to piece it up in a similar manner to what you did. It is of course possible to condense things down and run the whole thing in a single loop.
library(Synth)
# Create a vector of variable names
cnames <- colnames(all_data_uk)
outcome_var <- cnames[grepl("income_share", cnames)]
# Creating a 'wrapper function' for dataprep().
# Purely out of convenience, so we don't have to think about all
# the arguments that will stay the same.
prepfun <- function(VAR, control_units, treated_unit) {
dataprep(
foo = all_data_uk,
predictors = c("gdp_growth", "capital_quinn"),
predictors.op = "mean",
time.predictors.prior = 1971:1972,
special.predictors = list(
list(VAR, 1971, "mean"),
list(VAR, 1972, "mean")),
dependent = VAR,
unit.variable = "countryno",
unit.names.variable = "country",
time.variable = "year",
treatment.identifier = treated_unit,
controls.identifier = control_units,
time.optimize.ssr = 1971:1972,
time.plot = 1971:1973)
}
# Define treated and control units
treated_unit <- 16
control_units_top10 <- c(1,2)
control_units_top5 <- c(1,2,3)
control_units_top1 <- c(1,3)
control_list <- list(control_units_top10, control_units_top5, control_units_top1)
# Run dataprep() in a loop over both the variable names, and the list of
# control unit specifiers, returning a list of lists
dataprep_list <- mapply(prepfun, outcome_var, control_list, treated_unit,
SIMPLIFY=FALSE)
# Run synth() command
# In a loop over the dataprep list of lists
synth.out_list <- lapply(dataprep_list, synth, optimxmethod = "BFGS")
# simple summaries
lapply(synth.out_list, "[[", "solution.w")
lapply(synth.out_list, "[[", "solution.v")
# Annual discrepancies in the top 10 income share trend between unit
# 16 (United Kingdom) and its synthetic counterpart:
# defining the discrepancy function
discr <- function(y1, y0, sw) {
y1 - (y0 %*% sw)
}
# getting a list of data and weights for each variable
y1_list <- lapply(dataprep_list, "[[", "Y1plot")
y0_list <- lapply(dataprep_list, "[[", "Y0plot")
sw_list <- lapply(synth.out_list, "[[", "solution.w")
# mapply takes a single function and several lists of arguments
discrepancies <- mapply(discr, y1_list, y0_list, sw_list, SIMPLIFY=FALSE)
lapply()
和 mapply()
函数将删除列名,因此,如果要在最终结果中使用它们,则必须自己添加它们
The lapply()
and mapply()
functions will strip away the column names, so if you want them in the final result you'll have to add them yourself.
discrepancies <- do.call(cbind, discrepancies)
colnames(discrepancies) <- outcome_var
discrepancies
# top10_income_share top5_income_share top1_income_share
# 1971 0.0007806441 0.0016615564 0.0009574887
# 1972 -0.0007620713 -0.0016525268 -0.0009905464
# 1973 0.0007952133 0.0002760334 0.0011823801
如果对 * apply()
函数有任何疑问,请询问.我记得当我第一次开始使用R时,我很难理解.
If you have any questions about the *apply()
functions, please just ask. I remember I had a really hard time getting my head around them when I first started using R.
这篇关于循环许多结果变量-R中的Synth()包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!