伽马分布拟合误差

本文介绍了伽马分布拟合误差的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下是我的代码:

library(fitdistrplus)

s <- c(11, 4, 2, 9, 3, 1, 2, 2, 3, 2, 2, 5, 8,3, 15, 3, 9, 22, 0, 4, 10, 1, 9, 10, 11,
 2, 8, 2, 6, 0, 15, 0 , 2, 11, 0, 6, 3, 5, 0, 7, 6, 0, 7, 1, 0, 6, 4, 1, 3, 5,
 2, 6, 0, 10, 6, 4, 1, 17, 0, 1, 0, 6, 6, 1, 5, 4, 8, 0, 1, 1, 5, 15, 14, 8, 1,
 3, 2, 9, 4, 4, 1, 2, 18, 0, 0, 10, 5, 0, 5, 0, 1, 2, 0, 5, 1, 1, 2, 3, 7)

o <- fitdist(s, "gamma", method = "mle")
summary(o)
plot(o)

并且错误说:

and the error says:

fitdist(s, "gamma", method = "mle") 中的错误:函数 mle未能估计参数，错误代码为 100

推荐答案

Gamma 分布不允许零值(对于 0 的响应，似然将评估为零，对数似然将是无限的)，除非形状参数正好是 1.0(即指数分布 - 见下文)......这是一个统计/数学问题，而不是一个编程问题.您将不得不为零值找到一些明智的做法.根据对您的应用程序有意义的内容，您可以(例如)

The Gamma distribution doesn't allow zero values (the likelihood will evaluate to zero, and the log-likelihood will be infinite, for a response of 0) unless the shape parameter is exactly 1.0 (i.e., an exponential distribution - see below) ... that's a statistical/mathematical problem, not a programming problem. You're going to have to find something sensible to do about the zero values. Depending on what makes sense for your application, you could (for example)

选择不同的发行版进行测试
排除零值 (fitdist(s[s>0], ...))
将零值设置为一些合理的非零值 (fitdist(replace(s,which(s==0),0.1),...)

其中哪个(如果有)最好取决于您的应用.

@Sandipan Dey 的第一个答案(在数据集中留下零)看似是有道理的，但实际上它卡在形状参数等于 1 处.

@Sandipan Dey's first answer (leaving the zeros in the data set) appears to make sense, but in fact it gets stuck at the shape parameter equal to 1.

o <- fitdist(s, "exp", method = "mle")

给出与@Sandipan 代码相同的答案(除了它估计速率=0.2161572，为 Gamma 分布估计的尺度参数的倒数 = 4.626262 - 这只是参数化的变化).如果您选择拟合指数而不是 Gamma，那很好 - 但您应该有目的地这样做，而不是偶然...

gives the same answer as @Sandipan's code (except that it estimates rate=0.2161572, the inverse of the scale parameter=4.626262 that's estimated for the Gamma distribution - this is just a change in parameterization). If you choose to fit an exponential instead of a Gamma, that's fine - but you should do it on purpose, not by accident ...

为了说明包含零的拟合可能无法按预期工作，我将构建自己的负对数似然函数并显示每种情况的似然面.

To illustrate that the zeros-included fit may not be working as expected, I'll construct my own negative log-likelihood function and display the likelihood surface for each case.

mfun <- function(sh,sc,dd=s) {
  -sum(dgamma(dd,shape=sh,scale=sc,log=TRUE))
}

library(emdbook)  ## for curve3d() helper function

包含零的曲面:

cc1 <- curve3d(mfun(x,y),
      ## set up "shape" limits" so we evaluate
      ## exactly shape=1.000 ...
        xlim=c(0.55,3.55),
        n=c(41,41),
        ylim=c(2,5),
        sys3d="none")


png("gammazero1.png")
with(cc1,image(x,y,z))
dev.off()

在这种情况下，表面仅定义为 shape=1(即指数分布)；白色区域代表无限对数似然.不是 shape=1 是最佳合身，而是唯一合身...

In this case the surface is only defined at shape=1 (i.e. an exponential distribution); the white regions represent infinite log-likelihoods. It's not that shape=1 is the best fit, it's that it's the only fit ...

排除零的曲面:

cc2 <- curve3d(mfun(x,y,dd=s[s>0]),
               ## set up "shape" limits" so we evaluate
               ## exactly shape=1.000 ...
               xlim=c(0.55,3.55),
               n=c(41,41),
               ylim=c(2,5),
               sys3d="none")


png("gammazero2.png")
with(cc2,image(x,y,z))
with(cc2,contour(x,y,z,add=TRUE))
abline(v=1.0,lwd=2,lty=2)
dev.off()

这篇关于伽马分布拟合误差的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！