



我想模拟来自Weibull分布的左截断的故障时间数据.我的目标是通过拟合Weibull回归模型来模拟数据并检索系数(我用于模拟的x1,x2,x3,x4和x5).在这里,xt=runif(N, 30, 80)表示研究的开始,Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp))变量表示失败的时间.但是每当我进行回归分析时,我都会收到此警告消息

I want to simulate left truncated failure time data from Weibull distribution.My objective is to simulate data and retrieve the coefficients(of x1,x2,x3,x4, and x5 which I used for the simulation) by fitting a Weibull regression model. Here the xt=runif(N, 30, 80) denotes the start of the study, Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp)) variable denotes the failure time. But whenever I do the regression I am getting this warning message

Warning message:
In Surv(xt, time_M, event_M) : Stop time must be > start time, NA created```


N = 10^5
H <- within(data.frame(xt=runif(N, 30, 80), x1=rnorm(N, 2, 1), x2=rnorm(N, -2, 1)), {
  x3 <- rnorm(N, 0.5*x1 + 0.5*x2, 2)
  x4 <- rnorm(N, 0.3*x1 + 0.3*x2 + 0.3*x3, 2 )
  lp1 <- -2 + 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
  lp2 <- -2 + 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
  lp3 <- 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
  lp4 <- 0
  P1 <- exp(lp1)/(exp(lp2)+ exp(lp3)+1+exp(lp1))
  P2 <- exp(lp2)/(exp(lp1)+ exp(lp3)+1+exp(lp2))
  P3 <- exp(lp3)/(exp(lp2)+ exp(lp1)+1+exp(lp3))
  P4 <- 1/(exp(lp2)+ exp(lp3)+exp(lp1)+1)
  mChoices <- t(apply(cbind(P1,P2,P3,P4), 1, rmultinom, n = 1, size = 1))
  x5 <- apply(mChoices, 1, function(x) which(x==1))
  lp <-   0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3)
  Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp))
  Cens <- 100
  time_M <- pmin(Tm,Cens)
  event_M <- time_M == Tm })
res.full_M <- weibreg(Surv(H$xt,H$time_M, H$event_M) ~ x1 + x2 + x3 + x4 + factor(x5), data = H)

所以任何人都可以帮助我修改此代码,以便使我的开始年龄(xt)小于相应的故障时间(time_M),并且拟合的回归模型的系数值接近于以下方程式 (lp <- 0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3))

So can anyone help me to modify this code so that I can get the starting age (xt) less than the corresponding failure time (time_M) and the fitted regression model have coefficients values close to that in the following equation (lp <- 0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3))



Your first comment implies that you want (possibly censored) times from age 30 to diagnosis. You have two options: work with "survival times" or with the date of of the patients 30th birthday and their date of diagnosis. It's easier to use the former, as it's easier to specify your censoring rate.

  1. 从您选择的分布中生成未经审查的生存时间(T).
  2. 从Uniform(0,1)分布中绘制一个随机数.如果此数字小于您的审查率,则对观察结果进行审查:转到3.否则,您未经审查的观察到的生存时间为(T).
  3. 从Uniform(0,1)分布中绘制另一个随机变量(X).设置T = T * X.这是您审查的生存时间.
  1. Generate an uncensored survival time (T) from the distribution of your choice.
  2. Draw a random number from a Uniform(0, 1) distribution. If this number is less than your censoring rate, the observation is censored: go to 3. Otherwise, your uncensored observed survival time is (T).
  3. Draw another random variable (X) from a Uniform(0, 1) distribution. Set T = T*X. This is your censored survival time.


This procedure will give you data from any distribution of survival times, censored at the rate of your choice.


However, my reading of your specification tells me that every participant will at some point be diagnosed with the condition of interest. There are no competing risks. Is this reasonable?


Your second comment is confusing. Is your time to event (a) "time from age 30 to diagnosis" (which would imply right censoring) or (b) "time from onset of disease until diagnosis" (which would imply left censoring and could also involve right censoring). If (a), my solution still holds. If (b), you need to supply more information:

  • 从30岁到疾病发作的时间过程(分布)是什么?
  • 何时/多久进行一次诊断程序?
  • 诊断程序给出以下每个结果的机会是什么:误报,假阴性,真阳性,真阴性


It's still possible to generate the data you want, but it's not as easy as in (a).


09-05 21:02