如何按所有列对矩阵/data.frame进行排序

本文介绍了如何按所有列对矩阵/data.frame进行排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个矩阵，例如:

a = rep(0:1, each=4)
b = rep(rep(0:1, each=2), 2)
c = rep(0:1, times=4)
mat = cbind(c,b,a)

我需要对该矩阵的所有列进行排序.我知道如何通过对特定的列(即有限数量的列)进行排序.

I need to sort all columns of this matrix. I know how to do this by sorting specific columns (i.e. a limited number of columns).

mat[order(mat[,"c"],mat[,"b"],mat[,"a"]),]
     c b a
[1,] 0 0 0
[2,] 0 0 1
[3,] 0 1 0
[4,] 0 1 1
[5,] 1 0 0
[6,] 1 0 1
[7,] 1 1 0
[8,] 1 1 1

但是，我需要一种不调用任何列名的通用方法，因为我可以有任意数量的列.如何按大量列排序?

However, I need a generic way of doing this without calling any column names, because I could have any number of columns. How can I sort by a large number of columns?

推荐答案

以下是一个简洁的解决方案:

Here's a concise solution:

mat[do.call(order,as.data.frame(mat)),];
##      c b a
## [1,] 0 0 0
## [2,] 0 0 1
## [3,] 0 1 0
## [4,] 0 1 1
## [5,] 1 0 0
## [6,] 1 0 1
## [7,] 1 1 0
## [8,] 1 1 1

对as.data.frame()的调用以直观的方式将矩阵转换为data.frame，即每个矩阵列都成为新data.frame中的列表组件.由此，通过将矩阵的列出形式作为do.call()的第二个参数传递，可以有效地将每个矩阵列传递给order()的单个调用.

The call to as.data.frame() converts the matrix to a data.frame in the intuitive way, i.e. each matrix column becomes a list component in the new data.frame. From that, you can effectively pass each matrix column to a single invocation of order() by passing the listified form of the matrix as the second argument of do.call().

这将适用于任意数量的列.

This will work for any number of columns.

这不是一个愚蠢的问题. mat[order(as.data.frame(mat)),]不起作用的原因是因为order()不会不按行对data.frames进行排序.

It's not a dumb question. The reason that mat[order(as.data.frame(mat)),] does not work is because order() does not order data.frames by row.

与其根据列向量从左到右的顺序返回data.frame的行顺序(这是我的解决方案所做的事情)，它基本上将data.frame展平为单个大向量并对其进行排序.

Instead of returning a row order for the data.frame based on ordering the column vectors from left to right (which is what my solution does), it basically flattens the data.frame to a single big vector and orders that.

因此，实际上order(as.data.frame(mat))等同于order(mat)，因为矩阵也被视为平面向量.

So, in fact, order(as.data.frame(mat)) is equivalent to order(mat), as a matrix is treated as a flat vector as well.

对于您的特定数据，这将返回24个索引，理论上可以用于对原始矩阵mat进行索引(作为矢量)，但是由于在表达式mat[order(as.data.frame(mat)),]中，您尝试使用它们对索引进行索引mat的行维，某些索引超出了最高的行索引，因此会出现下标超出范围"错误.

For your particular data, this returns 24 indexes, which could theoretically be used to index (as a vector) the original matrix mat, but since in the expression mat[order(as.data.frame(mat)),] you're trying to use them to index just the row dimension of mat, some of the indexes are past the highest row index, so you get a "subscript out of bounds" error.

请参见 ?do.call .

我认为我无法比帮助页面更好地解释它；看一看这些示例，并与它们一起玩，直到您了解其工作原理为止.基本上，当要传递给函数的单个调用的参数陷入列表内时，需要调用它.

I don't think I can explain it better than the help page; take a look at the examples, play with them until you get how it works. Basically, you need to call it when the arguments you want to pass to a single invocation of a function are trapped inside a list.

您不能传递列表本身(因为您没有传递预期的参数，而是传递了包含预期参数的列表 )，因此必须有一个原始函数从列表中解包"函数调用的参数.

You can't pass the list itself (because then you're not passing the intended arguments, you're passing a list containing the intended arguments), so there must be a primitive function that "unwraps" the arguments from the list for the function call.

这是编程语言中的常见原语，在该语言中，函数是一类对象，尤其是(除了R的do.call()之外)JavaScript的 apply() ，Python(已弃用) apply() ，以及vim的 call() .

This is a common primitive in programming languages where functions are first-class objects, notably (besides R's do.call()) JavaScript's apply(), Python's (deprecated) apply(), and vim's call().

这篇关于如何按所有列对矩阵/data.frame进行排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！