问题描述
我无法解释逻辑回归的结果.我的结果变量是Decision
,并且是二进制的(0或1,分别取不取乘积).
我的预测变量是Thoughts
,它是连续的,可以是正数或负数,并且四舍五入到小数点后第二位.
我想知道随着Thoughts
改变产品服用概率的变化.
I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision
and is binary (0 or 1, not take or take a product, respectively).
My predictor variable is Thoughts
and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point.
I want to know how the probability of taking the product changes as Thoughts
changes.
逻辑回归方程为:
glm(Decision ~ Thoughts, family = binomial, data = data)
根据该模型,Thought
s对Decision
的概率具有重大影响(b = .72,p = .02).确定Decision
作为Thoughts
的函数的几率:
According to this model, Thought
s has a significant impact on probability of Decision
(b = .72, p = .02). To determine the odds ratio of Decision
as a function of Thoughts
:
exp(coef(results))
几率= 2.07.
问题:
-
我如何解释优势比?
How do I interpret the odds ratio?
- 比值比为2.07表示
Thoughts
中的0.01升高(或降低)会影响产品(或不服用)产品的获利率0.07 OR - 是否暗示随着
Thoughts
增加(减少).01,产品(不采取)的几率增加(减少)约2个单位?
- Does an odds ratio of 2.07 imply that a .01 increase (or decrease) in
Thoughts
affect the odds of taking (or not taking) the product by 0.07 OR - Does it imply that as
Thoughts
increases (decreases) by .01, the odds of taking (not taking) the product increase (decrease) by approximately 2 units?
如何将Thoughts
的优势比转换为Decision
的估计概率?
还是只能估计某个Thoughts
分数下的Decision
概率(即计算Thoughts == 1
时服用该产品的估计概率)?
How do I convert odds ratio of Thoughts
to an estimated probability of Decision
?
Or can I only estimate the probability of Decision
at a certain Thoughts
score (i.e. calculate the estimated probability of taking the product when Thoughts == 1
)?
推荐答案
由r中的逻辑回归返回的系数是logit或几率的对数.要将对数转换为优势比,您可以对它进行幂运算,就像上面所做的那样.要将logit转换为概率,可以使用函数exp(logit)/(1+exp(logit))
.但是,有关此过程,有一些注意事项.
The coefficient returned by a logistic regression in r is a logit, or the log of the odds. To convert logits to odds ratio, you can exponentiate it, as you've done above. To convert logits to probabilities, you can use the function exp(logit)/(1+exp(logit))
. However, there are some things to note about this procedure.
首先,我将使用一些可重现的数据进行说明
First, I'll use some reproducible data to illustrate
library('MASS')
data("menarche")
m<-glm(cbind(Menarche, Total-Menarche) ~ Age, family=binomial, data=menarche)
summary(m)
这将返回:
Call:
glm(formula = cbind(Menarche, Total - Menarche) ~ Age, family = binomial,
data = menarche)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0363 -0.9953 -0.4900 0.7780 1.3675
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.22639 0.77068 -27.54 <2e-16 ***
Age 1.63197 0.05895 27.68 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3693.884 on 24 degrees of freedom
Residual deviance: 26.703 on 23 degrees of freedom
AIC: 114.76
Number of Fisher Scoring iterations: 4
显示的系数是对数,如您的示例所示.如果我们绘制这些数据和该模型,我们将看到S型曲线函数,它是适合二项式数据的逻辑模型的特征
The coefficients displayed are for logits, just as in your example. If we plot these data and this model, we see the sigmoidal function that is characteristic of a logistic model fit to binomial data
#predict gives the predicted value in terms of logits
plot.dat <- data.frame(prob = menarche$Menarche/menarche$Total,
age = menarche$Age,
fit = predict(m, menarche))
#convert those logit values to probabilities
plot.dat$fit_prob <- exp(plot.dat$fit)/(1+exp(plot.dat$fit))
library(ggplot2)
ggplot(plot.dat, aes(x=age, y=prob)) +
geom_point() +
geom_line(aes(x=age, y=fit_prob))
请注意,概率的变化不是恒定的-曲线开始时缓慢上升,然后在中间上升更快,然后在结束时趋于平稳. 10和12之间的概率差异远小于12和14之间的概率差异.这意味着在不转换概率的情况下,不可能用一个数字来概括年龄和概率之间的关系.
Note that the change in probabilities is not constant - the curve rises slowly at first, then more quickly in the middle, then levels out at the end. The difference in probabilities between 10 and 12 is far less than the difference in probabilities between 12 and 14. This means that it's impossible to summarise the relationship of age and probabilities with one number without transforming probabilities.
要回答您的特定问题:
截距值的几率是x = 0(即零思想)时成功"的几率(在您的数据中,这是乘积的几率).系数的比值比是当您将一个整数x值相加时(即x = 1;一个想法),比截距值高的几率增加.使用初潮数据:
The odds ratio for the value of the intercept is the odds of a "success" (in your data, this is the odds of taking the product) when x = 0 (i.e. zero thoughts). The odds ratio for your coefficient is the increase in odds above this value of the intercept when you add one whole x value (i.e. x=1; one thought). Using the menarche data:
exp(coef(m))
(Intercept) Age
6.046358e-10 5.113931e+00
我们可以将其解释为年龄为0时初潮的几率是.00000000006.或者,基本上是不可能的.对年龄系数求幂可以告诉我们每个年龄单元的初潮几率有望增加.在这种情况下,它刚刚超过了五倍.优势比为1表示没有变化,而优势比为2表示增加了一倍,等等.
We could interpret this as the odds of menarche occurring at age = 0 is .00000000006. Or, basically impossible. Exponentiating the age coefficient tells us the expected increase in the odds of menarche for each unit of age. In this case, it's just over a quintupling. An odds ratio of 1 indicates no change, whereas an odds ratio of 2 indicates a doubling, etc.
您的赔率比为2.07,意味着想法"每增加1单位,乘积产品的赔率就会增加2.07倍.
您需要对选定的思想值进行此操作,因为如上图所示,变化在x值范围内不是恒定的.如果您想获得一些有价值的想法的可能性,请按以下步骤获得答案:
You need to do this for selected values of thoughts, because, as you can see in the plot above, the change is not constant across the range of x values. If you want the probability of some value for thoughts, get the answer as follows:
exp(intercept + coef*THOUGHT_Value)/(1+(exp(intercept+coef*THOUGHT_Value))
这篇关于R:在逻辑回归中计算和解释比值比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!