问题描述
我想用一个断点xt
拟合分段线性回归,这样对于x < xt
我们有一个二次多项式,对于x >= xt
我们有一条直线.两段应该平滑连接,连续性最高为xt
的一阶导数.这是它的外观图片:
I want to fit a piecewise linear regression with one break point xt
, such that for x < xt
we have a quadratic polynomial and for x >= xt
we have a straight line. Two pieces should join smoothly, with continuity up to 1st derivative at xt
. Here's picture of what it may look like:
我将分段回归函数参数化为:
I have parametrize my piecewise regression function as:
其中a
,b
,c
和xt
是要估计的参数.
where a
, b
, c
and xt
are parameters to be estimated.
我想将此模型与调整后R平方的整个范围内的二次多项式回归进行比较.
I want to compare this model with a quadratic polynomial regression over the whole range in terms of adjusted R-squared.
这是我的数据:
y <- c(1, 0.59, 0.15, 0.078, 0.02, 0.0047, 0.0019, 1, 0.56, 0.13,
0.025, 0.0051, 0.0016, 0.00091, 1, 0.61, 0.12, 0.026, 0.0067,
0.00085, 4e-04)
x <- c(0, 5.53, 12.92, 16.61, 20.3, 23.07, 24.92, 0, 5.53, 12.92,
16.61, 20.3, 23.07, 24.92, 0, 5.53, 12.92, 16.61, 20.3, 23.07,
24.92)
对于一个已知的xt
,我的尝试如下:
My attempt goes as follows, for a known xt
:
z <- pmax(0, x - xt)
x1 <- pmin(x, xt)
fit <- lm(y ~ x1 + I(x1 ^ 2) + z - 1)
但是直线似乎与xt
处的二次多项式不相切.我在哪里做错了?
But the straight line does not appear to be tangent to the quadratic polynomial at xt
. Where am I doing wrong?
类似的问题:
- 以直线和水平线在断点处连接的分段分割
- 在我的数据中拟合V形曲线(经过交叉验证)
- Piecewise regression with a straight line and a horizontal line joining at a break point
- Fitting a V-shape curve to my data (on Cross Validated)
推荐答案
在本节中,我将演示一个可重现的示例.请确保您已在其他答案中定义了源函数.
In this section I will be demonstrating a reproducible example. Please make sure you have sourced functions defined in the other answer.
## we first generate a true model
set.seed(0)
x <- runif(100) ## sample points on [0, 1]
beta <- c(0.1, -0.2, 2) ## true coefficients
X <- getX(x, 0.6) ## model matrix with true break point at 0.6
y <- X %*% beta + rnorm(100, 0, 0.08) ## observations with Gaussian noise
plot(x, y)
现在,假设我们不知道c
,我们想在一个均匀分布的网格上进行搜索:
Now, assume we don't know c
, and we would like to search on a evenly spaced grid:
c.grid <- seq(0.1, 0.9, 0.05)
fit <- choose.c(x, y, c.grid)
fit$c
RSS
选择了0.55.这与真实值0.6
略有不同,但是从图中可以看出,RSS
曲线在[0.5, 0.6]
之间变化不大,所以我对此感到满意.
RSS
has chosen 0.55. This is slightly different from the true value 0.6
, but from the plot we see that RSS
curve does not vary much between [0.5, 0.6]
so I am happy with this.
生成的模型fit
包含丰富的信息:
The resulting model fit
contains rich information:
#List of 12
# $ coefficients : num [1:3] 0.114 -0.246 2.366
# $ residuals : num [1:100] 0.03279 -0.01515 0.21188 -0.06542 0.00763 ...
# $ fitted.values: num [1:100] 0.0292 0.3757 0.2329 0.1087 0.0263 ...
# $ R : num [1:3, 1:3] -10 0.1 0.1 0.292 2.688 ...
# $ sig2 : num 0.00507
# $ coef.table : num [1:3, 1:4] 0.1143 -0.2456 2.3661 0.0096 0.0454 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:3] "beta0" "beta1" "beta2"
# .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
# $ aic : num -240
# $ bic : num -243
# $ c : num 0.55
# $ RSS : num 0.492
# $ r.squared : num 0.913
# $ adj.r.squared: num 0.911
我们可以提取系数汇总表:
We can extract the summary table for coefficients:
fit$coef.table
# Estimate Std. Error t value Pr(>|t|)
#beta0 0.1143132 0.009602697 11.904286 1.120059e-20
#beta1 -0.2455986 0.045409356 -5.408546 4.568506e-07
#beta2 2.3661097 0.169308226 13.975161 5.730682e-25
最后,我们希望看到一些预测图.
Finally, we want to see some prediction plot.
x.new <- seq(0, 1, 0.05)
p <- pred(fit, x.new)
head(p)
# fit se.fit lwr upr
#[1,] 0.9651406 0.02903484 0.9075145 1.0227668
#[2,] 0.8286400 0.02263111 0.7837235 0.8735564
#[3,] 0.7039698 0.01739193 0.6694516 0.7384880
#[4,] 0.5911302 0.01350837 0.5643199 0.6179406
#[5,] 0.4901212 0.01117924 0.4679335 0.5123089
#[6,] 0.4009427 0.01034868 0.3804034 0.4214819
我们可以绘制一个图:
plot(x, y, cex = 0.5)
matlines(x.new, p[,-2], col = c(1,2,2), lty = c(1,2,2), lwd = 2)
这篇关于具有二次多项式和在断点处平滑连接的直线的分段回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!