问题描述
我有一个矩阵,例如:
a = rep(0:1, each=4)
b = rep(rep(0:1, each=2), 2)
c = rep(0:1, times=4)
mat = cbind(c,b,a)
我需要对该矩阵的所有列进行排序.我知道如何通过对特定的列(即有限数量的列)进行排序.
I need to sort all columns of this matrix. I know how to do this by sorting specific columns (i.e. a limited number of columns).
mat[order(mat[,"c"],mat[,"b"],mat[,"a"]),]
c b a
[1,] 0 0 0
[2,] 0 0 1
[3,] 0 1 0
[4,] 0 1 1
[5,] 1 0 0
[6,] 1 0 1
[7,] 1 1 0
[8,] 1 1 1
但是,我需要一种不调用任何列名的通用方法,因为我可以有任意数量的列.如何按大量列排序?
However, I need a generic way of doing this without calling any column names, because I could have any number of columns. How can I sort by a large number of columns?
推荐答案
以下是一个简洁的解决方案:
Here's a concise solution:
mat[do.call(order,as.data.frame(mat)),];
## c b a
## [1,] 0 0 0
## [2,] 0 0 1
## [3,] 0 1 0
## [4,] 0 1 1
## [5,] 1 0 0
## [6,] 1 0 1
## [7,] 1 1 0
## [8,] 1 1 1
对as.data.frame()
的调用以直观的方式将矩阵转换为data.frame,即每个矩阵列都成为新data.frame中的列表组件.由此,通过将矩阵的列出形式作为do.call()
的第二个参数传递,可以有效地将每个矩阵列传递给order()
的单个调用.
The call to as.data.frame()
converts the matrix to a data.frame in the intuitive way, i.e. each matrix column becomes a list component in the new data.frame. From that, you can effectively pass each matrix column to a single invocation of order()
by passing the listified form of the matrix as the second argument of do.call()
.
这将适用于任意数量的列.
This will work for any number of columns.
这不是一个愚蠢的问题. mat[order(as.data.frame(mat)),]
不起作用的原因是因为order()
不会不按行对data.frames进行排序.
It's not a dumb question. The reason that mat[order(as.data.frame(mat)),]
does not work is because order()
does not order data.frames by row.
与其根据列向量从左到右的顺序返回data.frame的行顺序(这是我的解决方案所做的事情),它基本上将data.frame展平为单个大向量并对其进行排序.
Instead of returning a row order for the data.frame based on ordering the column vectors from left to right (which is what my solution does), it basically flattens the data.frame to a single big vector and orders that.
因此,实际上order(as.data.frame(mat))
等同于order(mat)
,因为矩阵也被视为平面向量.
So, in fact, order(as.data.frame(mat))
is equivalent to order(mat)
, as a matrix is treated as a flat vector as well.
对于您的特定数据,这将返回24个索引,理论上可以用于对原始矩阵mat
进行索引(作为矢量),但是由于在表达式mat[order(as.data.frame(mat)),]
中,您尝试使用它们对索引进行索引mat
的行维,某些索引超出了最高的行索引,因此会出现下标超出范围"错误.
For your particular data, this returns 24 indexes, which could theoretically be used to index (as a vector) the original matrix mat
, but since in the expression mat[order(as.data.frame(mat)),]
you're trying to use them to index just the row dimension of mat
, some of the indexes are past the highest row index, so you get a "subscript out of bounds" error.
请参见 ?do.call
.
我认为我无法比帮助页面更好地解释它;看一看这些示例,并与它们一起玩,直到您了解其工作原理为止.基本上,当要传递给函数的单个调用的参数陷入列表内时,需要调用它.
I don't think I can explain it better than the help page; take a look at the examples, play with them until you get how it works. Basically, you need to call it when the arguments you want to pass to a single invocation of a function are trapped inside a list.
您不能传递列表本身(因为您没有传递预期的参数,而是传递了包含预期参数的列表 ),因此必须有一个原始函数从列表中解包"函数调用的参数.
You can't pass the list itself (because then you're not passing the intended arguments, you're passing a list containing the intended arguments), so there must be a primitive function that "unwraps" the arguments from the list for the function call.
这是编程语言中的常见原语,在该语言中,函数是一类对象,尤其是(除了R的do.call()
之外)JavaScript的 apply()
,Python(已弃用) apply()
,以及vim的 call()
.
This is a common primitive in programming languages where functions are first-class objects, notably (besides R's do.call()
) JavaScript's apply()
, Python's (deprecated) apply()
, and vim's call()
.
这篇关于如何按所有列对矩阵/data.frame进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!