本文介绍了从 R 中的向量创建频率计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



Suppose there is a vector with numerical values with possible duplicated values

x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)


I want to create another vector of counts as follows.

  1. 它的长度与 x 相同.
  2. 对于 x 中的每个唯一值,第一次出现为 1,第二次出现为 2,依此类推.
  1. It has the same length as x.
  2. For each unique value in x, the first appearance is 1, the second appearance is 2, and so on.


1, 1, 1, 1, 1, 2, 2, 3, 2

我需要一种快速的方法,因为 x 可能很长.

I need a fast way of doing this since x can be really long.



> x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
> ave(x, x, FUN = seq_along)
[1] 1 1 1 1 1 2 2 3 2


Another option to consider is data.table. Although it is a little bit more work, it might pay off on very long vectors.


Here it is on your example--definitely seems like overkill!


x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
DT <- data.table(id = sequence(length(x)), x, key = "id")
DT[, y := sequence(.N), by = x][, y]
# [1] 1 1 1 1 1 2 2 3 2

但是对于 10,000,000 项长的向量呢?

But how about on a vector 10,000,000 items long?

x2 <- sample(100, 1e7, replace = TRUE)

funAve <- function() {
  ave(x2, x2, FUN = seq_along)

funDT <- function() {
  DT <- data.table(id = sequence(length(x2)), x2, key = "id")
  DT[, y := sequence(.N), by = x2][, y]

identical(funAve(), funDT())
# [1] TRUE

# Unit: seconds
#      expr      min       lq   median       uq      max neval
#  funAve() 6.727557 6.792743 6.827117 6.992609 7.352666    20
#   funDT() 1.967795 2.029697 2.053886 2.070462 2.123531    20

这篇关于从 R 中的向量创建频率计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:44