本文介绍了如何通过对数据框中的列进行排序来快速形成组(四分位数,十分位数等)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到很多关于ordersort的问题和答案.是否有将矢量或数据帧分类为分组的东西(如四分位数或十分位数)?我有一个手动"解决方案,但是可能有一个经过小组测试的更好的解决方案.

I see a lot of questions and answers re order and sort. Is there anything that sorts vectors or data frames into groupings (like quartiles or deciles)? I have a "manual" solution, but there's likely a better solution that has been group-tested.

这是我的尝试:

temp <- data.frame(name=letters[1:12], value=rnorm(12), quartile=rep(NA, 12))
temp
#    name       value quartile
# 1     a  2.55118169       NA
# 2     b  0.79755259       NA
# 3     c  0.16918905       NA
# 4     d  1.73359245       NA
# 5     e  0.41027113       NA
# 6     f  0.73012966       NA
# 7     g -1.35901658       NA
# 8     h -0.80591167       NA
# 9     i  0.48966739       NA
# 10    j  0.88856758       NA
# 11    k  0.05146856       NA
# 12    l -0.12310229       NA
temp.sorted <- temp[order(temp$value), ]
temp.sorted$quartile <- rep(1:4, each=12/4)
temp <- temp.sorted[order(as.numeric(rownames(temp.sorted))), ]
temp
#    name       value quartile
# 1     a  2.55118169        4
# 2     b  0.79755259        3
# 3     c  0.16918905        2
# 4     d  1.73359245        4
# 5     e  0.41027113        2
# 6     f  0.73012966        3
# 7     g -1.35901658        1
# 8     h -0.80591167        1
# 9     i  0.48966739        3
# 10    j  0.88856758        4
# 11    k  0.05146856        2
# 12    l -0.12310229        1

是否有更好的方法(更清洁/更快/单行)?谢谢!

Is there a better (cleaner/faster/one-line) approach? Thanks!

推荐答案

我使用的方法是以下方法之一或Hmisc::cut2(value, g=4):

The method I use is one of these or Hmisc::cut2(value, g=4):

temp$quartile <- with(temp, cut(value, 
                                breaks=quantile(value, probs=seq(0,1, by=0.25), na.rm=TRUE), 
                                include.lowest=TRUE))

替代方法可能是:

temp$quartile <- with(temp, factor(
                            findInterval( val, c(-Inf,
                               quantile(val, probs=c(0.25, .5, .75)), Inf) , na.rm=TRUE), 
                            labels=c("Q1","Q2","Q3","Q4")
      ))

第一个具有用值标记四分位数的副作用,我认为这是好事",但是如果它不对您有好处",或者注释中提出的有效问题令人担忧您可以使用版本2.可以在cut中使用labels=,也可以将以下行添加到代码中:

The first one has the side-effect of labeling the quartiles with the values, which I consider a "good thing", but if it were not "good for you", or the valid problems raised in the comments were a concern you could go with version 2. You can use labels= in cut, or you could add this line to your code:

temp$quartile <- factor(temp$quartile, levels=c("1","2","3","4") )

或者它的工作原理更快或更模糊,尽管它不再是一个因素,而是一个数字矢量:

Or even quicker but slightly more obscure in how it works, although it is no longer a factor, but rather a numeric vector:

temp$quartile <- as.numeric(temp$quartile)

这篇关于如何通过对数据框中的列进行排序来快速形成组(四分位数,十分位数等)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 15:40