问题描述
我正在比赛.我已经创建了一个要学习的基本数据库,并且具有以下数据库:
propScore团体公共部门1 0 1 82 0 2 73 0 3 64 0 4 75 1 1 86 1 2 77 1 3 68 1 4 79 1 5 210 1 6 3
我用:
m.out = matchit(组〜dep + public,数据= propScore,方法=最近",比率= 1)
但是我获得了这场比赛:
15不适用6"1"表示7"4"表示8不适用9"3"表示10"2"表示
但是我认为正确的事情是:
15"1"表示6"2"表示7"3"表示8" 49不适用10不适用
我做错了什么?谢谢
默认情况下, matchit
的工作方式是使用协变量对处理的逻辑回归来估计每个单元的倾向得分.此倾向得分存储在 m.out
的distance属性中.我们可以看看包含倾向得分的数据:
>cbind(propScore,ps = m.out $ distance)小组公共公共部门1 0 1 8 0.39030122 0 2 7 0.52949483 0 3 6 0.66424724 0 4 7 0.47925775 1 1 8 0.39030126 1 2 7 0.52949487 1 3 6 0.66424728 1 4 7 0.47925779 1 5 2 0.958515410 1 6 3 0.9148828
您可能会注意到6和2具有相同的倾向得分,因为它们具有相同的协变量值,但是它们彼此不匹配.这似乎很奇怪,但是与匹配而不替换时找到匹配的顺序有关.
默认情况下, matchit
按照所处理单位的倾向得分的降序顺序执行匹配.第9单元的倾向得分最高(.959),因此它首先与第3单元匹配.接下来是单元10,它与单元2相匹配,因为单元3已经与单元9相匹配,并且您无需更换即可进行匹配(这意味着每个控制单元只能使用一次).即使单元10和2彼此相距很远,但在已经使用了单元3之后,单元2的确是与单元10最接近的单元.到单元6为止,只有单元1和4可用,因此单元6与单元1匹配.
以这种方式进行匹配的要点是,为那些倾向得分最高的被治疗单位提供找到相对接近的匹配的最佳机会,因为它们可能是最难找到的匹配.但是,这种策略并不总是有效,有时您会发现奇怪的匹配,例如找到的匹配,其中两个相同的单位彼此不匹配.
您可以通过设置 m.order =最小"
来更改匹配顺序,该顺序以倾向得分的升序顺序进行匹配.您应该发现,使用此选项,单元5与单元1匹配,单元6与单元2匹配.您还可以设置 m.order ="random"
,以随机方式匹配命令.如果使用此选项,请确保使用 set.seed()
设置种子,以使结果可复制.
如评论中所述,您还可以通过设置 replace = TRUE
来执行替换匹配.因为控制单元现在可以重复用于多个匹配,所以单元10、9和7都将与单元3匹配,而单元6将与其双胞胎单元2匹配.
您还可以设置卡尺;这定义了允许匹配的最大距离.在您最初的 matchit()
调用中,第10单元与其最接近的匹配第3单元相差0.25,这是一个很大的距离,从而使这些单元彼此之间不太相似.您可以将允许的匹配限制在彼此之间的某个距离之内,以倾向得分的标准偏差衡量.如果您设置了一个狭窄的卡尺,例如 caliper = .15
,则只会匹配彼此靠近的单位,而在卡尺内没有匹配项的任何已处理单位都将不匹配..使用.15的卡尺,第9单元和第10单元不接收匹配,其他经过处理的单元与对照组中的双胞胎匹配.
I'm working on a match. I have created a basic database to learn and I have this data base:
propScore
group dep public
1 0 1 8
2 0 2 7
3 0 3 6
4 0 4 7
5 1 1 8
6 1 2 7
7 1 3 6
8 1 4 7
9 1 5 2
10 1 6 3
And I use:
m.out = matchit(group ~ dep + public, data = propScore, method = "nearest", ratio = 1)
but I obtain this match:
1
5 NA
6 "1"
7 "4"
8 NA
9 "3"
10 "2"
but I think the correct thing would be:
1
5 "1"
6 "2"
7 "3"
8 "4
9 NA
10 NA
What am I doing wrong? Thanks
The way matchit
works by default is that it estimates propensity scores for each unit using a logistic regression of the treatment on the covariates. This propensity score is stored in the distance attribute of m.out
. We can take a look at the data with the propensity scores included:
> cbind(propScore, ps = m.out$distance) group dep public ps 1 0 1 8 0.3903012 2 0 2 7 0.5294948 3 0 3 6 0.6642472 4 0 4 7 0.4792577 5 1 1 8 0.3903012 6 1 2 7 0.5294948 7 1 3 6 0.6642472 8 1 4 7 0.4792577 9 1 5 2 0.9585154 10 1 6 3 0.9148828
You may notice that 6 and 2 have identical propensity scores because they have identical covariate values, and yet they were not matched to each other. This seems strange, but it has to do with the order in which matches are found when matching without replacement.
By default, matchit
performs matching in descending order of the propensity scores for the treated units. Unit 9 has the largest propensity score (.959), so it gets matched first (to unit 3). Unit 10 is next, and it gets matched to unit 2 because unit 3 has already been matched to unit 9 and you are matching without replacement (meaning each control unit can be used only once). Even though units 10 and 2 are very far apart from each other, unit 2 is indeed the closest unit to unit 10 after having used unit 3 already. By the time we get to unit 6, only units 1 and 4 are available, so unit 6 is matched with unit 1.
The point of matching this way is to give those treated units with the highest propensity score the best chance to find a relatively close match since those are likely to be the hardest to find matches for. This strategy doesn't always work, however, and sometimes you get weird matches like the one you found, where two identical units are not matched with each other.
You can change the order of matching by setting m.order = "smallest"
, which matches in ascending order of the propensity score. You should find that with this option, unit 5 is matched with unit 1, and unit 6 is matched with unit 2. You can also set m.order = "random"
, which matches in a random order. If you use this option, make sure you set a seed using set.seed()
so your results are replicable.
As was mentioned in the comments, you can also perform matching with replacement by setting replace = TRUE
. Because control units can now be reused for multiple matches, units 10, 9, and 7 will all be matched to unit 3, and unit 6 will be matched to its twin, unit 2.
You can also set a caliper; this defines the maximum distance for an allowable match. In your original matchit()
call, unit 10 and its closest match, unit 3, differ by .25, which is a huge distance, making these units not very similar to each other. You can restrict the allowable matches to be within some distance of each other, measured in standard deviations of the propensity score. If you set a narrow caliper, e.g., caliper = .15
, only units that are close to each other will be matched, and any treated unit that doesn't have a match within the caliper will be unmatched. Using a caliper of .15, units 9 and 10 don't receive matches, and the other treated units are matched with their twins in the control group.
这篇关于R中的倾向得分匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!