可靠地检索分位数函数的反函数

我已经阅读了其他文章(例如here)，以获取分位数的“反转”-也就是说，获取与一系列值中的某个值相对应的百分位数。

但是，对于相同的数据序列，答案给出的分位数与分位数不一样。

我还研究了分位数提供了9种不同的算法来计算百分位数。

所以我的问题是:是否有可靠的方法来获得分位数函数的反函数？ ecdf没有采用“类型”参数，因此似乎无法确保它们使用相同的方法。

可重现的示例:

# Simple data
x = 0:10
pcntile = 0.5


# Get value corresponding to a percentile using quantile
(pcntile_value <- quantile(x, pcntile))

# 50%
# 5               # returns 5 as expected for 50% percentile



# Get percentile corresponding to a value using ecdf function
(pcntile_rev <- ecdf(x)(5))


# [1] 0.5454545   #returns 54.54% as the percentile for the value 5


# Not the same answer as quantile produces

最佳答案

链接中的答案确实不错，但看看ecdf也许会有所帮助
只需运行以下代码:

# Simple data
x = 0:10
p0 = 0.5

# Get value corresponding to a percentile using quantile
sapply(c(1:7), function(i) quantile(x, p0, type = i))
# 50% 50% 50% 50% 50% 50% 50%
# 5.0 5.0 5.0 4.5 5.0 5.0 5.0

因此，这不是类型问题。您可以使用debug进入该功能:

# Get percentile corresponding to a value using ecdf function
debug(ecdf)
my_ecdf <- ecdf(x)

关键部分是

rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n,
    method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")

之后，您可以检查

data.frame(x = vals, y = round(cumsum(tabulate(match(x, vals)))/n, 3), stringsAsFactors = FALSE)

当您使用n=11进行设计时，结果并不令人惊讶。如前所述，对于理论，请看另一个答案。

顺便说一句，您也可以绘制函数

plot(my_ecdf)

关于您的评论。我认为这不是可靠性的问题，而是如何定义“反分布函数(如果不存在)”的问题:

广义逆的一个很好的引用:Paul Embrechts，Marius Hofert:“关于广义逆的说明”，Math Meth Oper Res(2013)77:423–432 DOI

关于可靠地检索分位数函数的反函数，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/56724460/