本文介绍了R - cox 风险模型不包括因子水平的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将 cox 模型拟合到一些结构如下的数据中:

I am fitting a cox model to some data that is structured as such:

str(test)
'data.frame':   147 obs. of  8 variables:
 $ AGE              : int  71 69 90 78 61 74 78 78 81 45 ...
 $ Gender           : Factor w/ 2 levels "F","M": 2 1 2 1 2 1 2 1 2 1 ...
 $ RACE             : Factor w/ 5 levels "","BLACK","HISPANIC",..: 5 2 5 5 5 5 5 5 5 1 ...
 $ SIDE             : Factor w/ 2 levels "L","R": 1 1 2 1 2 1 1 1 2 1 ...
 $ LESION.INDICATION: Factor w/ 12 levels "CLAUDICATION",..: 1 11 4 11 9 1 1 11 11 11 ...
 $ RUTH.CLASS       : int  3 5 4 5 4 3 3 5 5 5 ...
 $ LESION.TYPE      : Factor w/ 3 levels "","OCCLUSION",..: 3 3 2 3 3 3 2 3 3 3 ...
 $ Primary          : int  1190 1032 166 689 219 840 1063 115 810 157 ...

RUTH.CLASS 变量实际上是一个因素,我已将其更改为一个:

the RUTH.CLASS variable is actually a factor, and i've changed it to one as such:

> test$RUTH.CLASS <- as.factor(test$RUTH.CLASS)
> summary(test$RUTH.CLASS)
 3  4  5  6
48 56 35  8

太好了.

拟合模型后

stent.surv <- Surv(test$Primary)
> cox.ruthclass <- coxph(stent.surv ~ RUTH.CLASS, data=test )
>
> summary(cox.ruthclass)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS, data = test)

  n= 147, number of events= 147

              coef exp(coef) se(coef)     z Pr(>|z|)
RUTH.CLASS4 0.1599    1.1734   0.1987 0.804  0.42111
RUTH.CLASS5 0.5848    1.7947   0.2263 2.585  0.00974 **
RUTH.CLASS6 0.3624    1.4368   0.3846 0.942  0.34599
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4     1.173     0.8522    0.7948     1.732
RUTH.CLASS5     1.795     0.5572    1.1518     2.796
RUTH.CLASS6     1.437     0.6960    0.6762     3.053

Concordance= 0.574  (se = 0.026 )
Rsquare= 0.045   (max possible= 1 )
Likelihood ratio test= 6.71  on 3 df,   p=0.08156
Wald test            = 7.09  on 3 df,   p=0.06902
Score (logrank) test = 7.23  on 3 df,   p=0.06478

> levels(test$RUTH.CLASS)
[1] "3" "4" "5" "6"

当我在模型中拟合更多变量时,会发生类似的事情:

When i fit more variables in the model, similar things happen:

cox.fit <- coxph(stent.surv ~ RUTH.CLASS + LESION.INDICATION + LESION.TYPE, data=test )
>
> summary(cox.fit)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS + LESION.INDICATION +
    LESION.TYPE, data = test)

  n= 147, number of events= 147

                                          coef exp(coef) se(coef)      z Pr(>|z|)
RUTH.CLASS4                            -0.5854    0.5569   1.1852 -0.494   0.6214
RUTH.CLASS5                            -0.1476    0.8627   1.0182 -0.145   0.8847
RUTH.CLASS6                            -0.4509    0.6370   1.0998 -0.410   0.6818
LESION.INDICATIONEMBOLIC               -0.4611    0.6306   1.5425 -0.299   0.7650
LESION.INDICATIONISCHEMIA               1.3794    3.9725   1.1541  1.195   0.2320
LESION.INDICATIONISCHEMIA/CLAUDICATION  0.2546    1.2899   1.0189  0.250   0.8027
LESION.INDICATIONREST PAIN              0.5302    1.6993   1.1853  0.447   0.6547
LESION.INDICATIONTISSUE LOSS            0.7793    2.1800   1.0254  0.760   0.4473
LESION.TYPEOCCLUSION                   -0.5886    0.5551   0.4360 -1.350   0.1770
LESION.TYPESTEN                        -0.7895    0.4541   0.4378 -1.803   0.0714 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                                       exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4                               0.5569     1.7956   0.05456     5.684
RUTH.CLASS5                               0.8627     1.1591   0.11726     6.348
RUTH.CLASS6                               0.6370     1.5698   0.07379     5.499
LESION.INDICATIONEMBOLIC                  0.6306     1.5858   0.03067    12.964
LESION.INDICATIONISCHEMIA                 3.9725     0.2517   0.41374    38.141
LESION.INDICATIONISCHEMIA/CLAUDICATION    1.2899     0.7752   0.17510     9.503
LESION.INDICATIONREST PAIN                1.6993     0.5885   0.16645    17.347
LESION.INDICATIONTISSUE LOSS              2.1800     0.4587   0.29216    16.266
LESION.TYPEOCCLUSION                      0.5551     1.8015   0.23619     1.305
LESION.TYPESTEN                           0.4541     2.2023   0.19250     1.071

Concordance= 0.619  (se = 0.028 )
Rsquare= 0.137   (max possible= 1 )
Likelihood ratio test= 21.6  on 10 df,   p=0.01726
Wald test            = 22.23  on 10 df,   p=0.01398
Score (logrank) test = 23.46  on 10 df,   p=0.009161

> levels(test$LESION.INDICATION)
[1] "CLAUDICATION"          "EMBOLIC"               "ISCHEMIA"              "ISCHEMIA/CLAUDICATION"
[5] "REST PAIN"             "TISSUE LOSS"
> levels(test$LESION.TYPE)
[1] ""          "OCCLUSION" "STEN"

从下面的 model.matrix 截断的输出:

truncated output from model.matrix below:

> model.matrix(cox.fit)
    RUTH.CLASS4 RUTH.CLASS5 RUTH.CLASS6 LESION.INDICATIONEMBOLIC LESION.INDICATIONISCHEMIA
1             0           0           0                        0                         0
2             0           1           0                        0                         0

我们可以看到,每个模型的第一层都被排除在模型之外.任何投入将不胜感激.我注意到在 LESION.TYPE 变量上,没有包含空白级别 "",但这不是设计的 - 那应该是 NA 或类似的东西.

We can see that the the first level of each of these is being excluded from the model. Any input would be greatly appreciated. I noticed that on the LESION.TYPE variable, the blank level "" is not being included, but that is not by design - that should be a NA or something similar.

我很困惑,可以使用一些帮助来解决这个问题.谢谢.

I'm confused and could use some help with this. Thanks.

推荐答案

任何模型中的因子返回基于基准水平的系数(对比).您的 contrasts 默认为基准因子.计算丢弃值的系数没有意义,因为模型将在丢弃值 = 1 时返回预测,因为所有其他因子值都为 0(每个观察值的因子都是完整且互斥的).您可以通过更改 options 中的 contrasts 来更改默认对比度.

Factors in any model return coefficients based on a base level (a contrast).Your contrasts default to a base factor. There is no point in calculating a coefficient for the dropped value because the model will return the predictions when that dropped value = 1 given that all the other factor values are 0 (factors are complete and mutually exclusive for every observation). You can alter your default contrast by changing the contrasts in your options.

为了您的系数与所有因素的平均值:

For your coefficients to be versus an average of all factors:

options(contrasts=c(unordered="contr.sum", ordered="contr.poly"))

对于您的系数与特定处理(您上面的和您的默认值):

For your coefficients to be versus a specific treatment (what you have above and your default):

options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))

如您所见,R 中有两种类型的因素:无序(或分类,例如红色、绿色、蓝色)和有序(例如非常不同意、不同意、没有意见、同意、非常同意)

As you can see there are two types of factors in R: unordered (or categorical, e.g. red, green, blue) and ordered (e.g. strongly disagree, disagree, no opinion, agree, strongly agree)

这篇关于R - cox 风险模型不包括因子水平的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 04:18