本文介绍了从 R 中的向量创建频率计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设有一个带有可能重复值的数值的向量
Suppose there is a vector with numerical values with possible duplicated values
x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
我想创建另一个计数向量,如下所示.
I want to create another vector of counts as follows.
- 它的长度与
x
相同. - 对于
x
中的每个唯一值,第一次出现为 1,第二次出现为 2,依此类推.
- It has the same length as
x
. - For each unique value in
x
, the first appearance is 1, the second appearance is 2, and so on.
我想要的新向量是
1, 1, 1, 1, 1, 2, 2, 3, 2
我需要一种快速的方法,因为 x
可能很长.
I need a fast way of doing this since x
can be really long.
推荐答案
使用ave
和seq_along
:
> x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
> ave(x, x, FUN = seq_along)
[1] 1 1 1 1 1 2 2 3 2
另一个需要考虑的选项是data.table
.虽然这需要更多的工作,但它可能会在很长的向量上得到回报.
Another option to consider is data.table
. Although it is a little bit more work, it might pay off on very long vectors.
这是你的例子——绝对看起来有点矫枉过正!
Here it is on your example--definitely seems like overkill!
library(data.table)
x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
DT <- data.table(id = sequence(length(x)), x, key = "id")
DT[, y := sequence(.N), by = x][, y]
# [1] 1 1 1 1 1 2 2 3 2
但是对于 10,000,000 项长的向量呢?
But how about on a vector 10,000,000 items long?
set.seed(1)
x2 <- sample(100, 1e7, replace = TRUE)
funAve <- function() {
ave(x2, x2, FUN = seq_along)
}
funDT <- function() {
DT <- data.table(id = sequence(length(x2)), x2, key = "id")
DT[, y := sequence(.N), by = x2][, y]
}
identical(funAve(), funDT())
# [1] TRUE
library(microbenchmark)
# Unit: seconds
# expr min lq median uq max neval
# funAve() 6.727557 6.792743 6.827117 6.992609 7.352666 20
# funDT() 1.967795 2.029697 2.053886 2.070462 2.123531 20
这篇关于从 R 中的向量创建频率计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!