如何按由长字符串组成的列对R数据进行排序?以下示例说明了我的问题:

> a = matrix(NA, nrow=4, ncol=3)
> a[,1] = c(1,2,3,4)
> a[,2] = c("gene001_10M","gene002_10M","gene001_50M","gene002_50M")
> colnames(a) = c("value","sortkey","other")
> a = as.data.frame(a)
> a
  value     sortkey other
1     1 gene001_10M  <NA>
2     2 gene002_10M  <NA>
3     3 gene001_50M  <NA>
4     4 gene002_50M  <NA>

当我现在对“a”进行排序时,排序键似乎是从右向左读取的,而“a”则保持不变:
> b = a[sort(a$sortkey),]
> b
  value     sortkey other
1     1 gene001_10M  <NA>
2     2 gene002_10M  <NA>
3     3 gene001_50M  <NA>
4     4 gene002_50M  <NA>

但是,我的目标是:
> b
  value     sortkey other
1     1 gene001_10M  <NA>
3     3 gene001_50M  <NA>
2     2 gene002_10M  <NA>
4     4 gene002_50M  <NA>

最佳答案

当您有numbersalphabets等时,最好使用mixedorder中的gtools,尽管在这里它仅适用于order

  a[order(as.character(a$sortkey)),]
  #  value     sortkey other
  #1     1 gene001_10M  <NA>
  #3     3 gene001_50M  <NA>
  #2     2 gene002_10M  <NA>
  #4     4 gene002_50M  <NA>

另外,使用sort将获得values而不是index
   sort(as.character(a$sortkey))
   #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"

否则,您必须指定index.return=TRUE,默认情况下是FALSE中的sort
   sort(as.character(a$sortkey), index.return=TRUE)
   #$x
  #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"

  #$ix
  #[1] 1 3 2 4

然后,使用
   a[sort(as.character(a$sortkey), index.return=TRUE)$ix,]
  #  value     sortkey other
  #1     1 gene001_10M  <NA>
  #3     3 gene001_50M  <NA>
  #2     2 gene002_10M  <NA>
  #4     4 gene002_50M  <NA>

也,
  library(gtools)
   mixedorder(as.character(a$sortkey))
   #[1] 1 3 2 4

07-24 09:52
查看更多