如何按由长字符串组成的列对R数据进行排序?以下示例说明了我的问题:
> a = matrix(NA, nrow=4, ncol=3)
> a[,1] = c(1,2,3,4)
> a[,2] = c("gene001_10M","gene002_10M","gene001_50M","gene002_50M")
> colnames(a) = c("value","sortkey","other")
> a = as.data.frame(a)
> a
value sortkey other
1 1 gene001_10M <NA>
2 2 gene002_10M <NA>
3 3 gene001_50M <NA>
4 4 gene002_50M <NA>
当我现在对“a”进行排序时,排序键似乎是从右向左读取的,而“a”则保持不变:
> b = a[sort(a$sortkey),]
> b
value sortkey other
1 1 gene001_10M <NA>
2 2 gene002_10M <NA>
3 3 gene001_50M <NA>
4 4 gene002_50M <NA>
但是,我的目标是:
> b
value sortkey other
1 1 gene001_10M <NA>
3 3 gene001_50M <NA>
2 2 gene002_10M <NA>
4 4 gene002_50M <NA>
最佳答案
当您有numbers
,alphabets
等时,最好使用mixedorder
中的gtools
,尽管在这里它仅适用于order
a[order(as.character(a$sortkey)),]
# value sortkey other
#1 1 gene001_10M <NA>
#3 3 gene001_50M <NA>
#2 2 gene002_10M <NA>
#4 4 gene002_50M <NA>
另外,使用
sort
将获得values
而不是index
sort(as.character(a$sortkey))
#[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"
否则,您必须指定
index.return=TRUE
,默认情况下是FALSE
中的sort
sort(as.character(a$sortkey), index.return=TRUE)
#$x
#[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"
#$ix
#[1] 1 3 2 4
然后,使用
a[sort(as.character(a$sortkey), index.return=TRUE)$ix,]
# value sortkey other
#1 1 gene001_10M <NA>
#3 3 gene001_50M <NA>
#2 2 gene002_10M <NA>
#4 4 gene002_50M <NA>
也,
library(gtools)
mixedorder(as.character(a$sortkey))
#[1] 1 3 2 4