问题描述
我看到很多关于order
和sort
的问题和答案.是否有任何东西可以将向量或数据框分类为分组(如四分位数或十分位数)?我有一个手动"解决方案,但可能有一个经过小组测试的更好的解决方案.
I see a lot of questions and answers re order
and sort
. Is there anything that sorts vectors or data frames into groupings (like quartiles or deciles)? I have a "manual" solution, but there's likely a better solution that has been group-tested.
这是我的尝试:
temp <- data.frame(name=letters[1:12], value=rnorm(12), quartile=rep(NA, 12))
temp
# name value quartile
# 1 a 2.55118169 NA
# 2 b 0.79755259 NA
# 3 c 0.16918905 NA
# 4 d 1.73359245 NA
# 5 e 0.41027113 NA
# 6 f 0.73012966 NA
# 7 g -1.35901658 NA
# 8 h -0.80591167 NA
# 9 i 0.48966739 NA
# 10 j 0.88856758 NA
# 11 k 0.05146856 NA
# 12 l -0.12310229 NA
temp.sorted <- temp[order(temp$value), ]
temp.sorted$quartile <- rep(1:4, each=12/4)
temp <- temp.sorted[order(as.numeric(rownames(temp.sorted))), ]
temp
# name value quartile
# 1 a 2.55118169 4
# 2 b 0.79755259 3
# 3 c 0.16918905 2
# 4 d 1.73359245 4
# 5 e 0.41027113 2
# 6 f 0.73012966 3
# 7 g -1.35901658 1
# 8 h -0.80591167 1
# 9 i 0.48966739 3
# 10 j 0.88856758 4
# 11 k 0.05146856 2
# 12 l -0.12310229 1
有更好(更干净/更快/一行)的方法吗?谢谢!
Is there a better (cleaner/faster/one-line) approach? Thanks!
推荐答案
我使用的方法是其中之一或者 Hmisc::cut2(value, g=4)
:
The method I use is one of these or Hmisc::cut2(value, g=4)
:
temp$quartile <- with(temp, cut(value,
breaks=quantile(value, probs=seq(0,1, by=0.25), na.rm=TRUE),
include.lowest=TRUE))
另一种可能是:
temp$quartile <- with(temp, factor(
findInterval( val, c(-Inf,
quantile(val, probs=c(0.25, .5, .75)), Inf) , na.rm=TRUE),
labels=c("Q1","Q2","Q3","Q4")
))
第一个具有用值标记四分位数的副作用,我认为这是好事",但如果它不是对你有好处",或者评论中提出的有效问题是一个问题您可以使用第 2 版.您可以在 cut
中使用 labels=
,或者您可以将此行添加到您的代码中:
The first one has the side-effect of labeling the quartiles with the values, which I consider a "good thing", but if it were not "good for you", or the valid problems raised in the comments were a concern you could go with version 2. You can use labels=
in cut
, or you could add this line to your code:
temp$quartile <- factor(temp$quartile, levels=c("1","2","3","4") )
或者甚至更快,但它的工作原理稍微有点模糊,尽管它不再是一个因素,而是一个数字向量:
Or even quicker but slightly more obscure in how it works, although it is no longer a factor, but rather a numeric vector:
temp$quartile <- as.numeric(temp$quartile)
这篇关于如何通过对数据框中的列进行排序来快速形成组(四分位数、十分位数等)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!