从 R 中的向量创建频率计数

本文介绍了从 R 中的向量创建频率计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设有一个带有可能重复值的数值的向量

Suppose there is a vector with numerical values with possible duplicated values

x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)

我想创建另一个计数向量，如下所示.

I want to create another vector of counts as follows.

它的长度与 x 相同.
对于 x 中的每个唯一值，第一次出现为 1，第二次出现为 2，依此类推.

It has the same length as x.
For each unique value in x, the first appearance is 1, the second appearance is 2, and so on.

我想要的新向量是

1, 1, 1, 1, 1, 2, 2, 3, 2

我需要一种快速的方法，因为 x 可能很长.

I need a fast way of doing this since x can be really long.

推荐答案

使用ave和seq_along:

> x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
> ave(x, x, FUN = seq_along)
[1] 1 1 1 1 1 2 2 3 2

另一个需要考虑的选项是data.table.虽然这需要更多的工作，但它可能会在很长的向量上得到回报.

Another option to consider is data.table. Although it is a little bit more work, it might pay off on very long vectors.

这是你的例子——绝对看起来有点矫枉过正！

Here it is on your example--definitely seems like overkill!

library(data.table)

x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
DT <- data.table(id = sequence(length(x)), x, key = "id")
DT[, y := sequence(.N), by = x][, y]
# [1] 1 1 1 1 1 2 2 3 2

但是对于 10,000,000 项长的向量呢?

But how about on a vector 10,000,000 items long?

set.seed(1)
x2 <- sample(100, 1e7, replace = TRUE)

funAve <- function() {
  ave(x2, x2, FUN = seq_along)
}

funDT <- function() {
  DT <- data.table(id = sequence(length(x2)), x2, key = "id")
  DT[, y := sequence(.N), by = x2][, y]
}

identical(funAve(), funDT())
# [1] TRUE

library(microbenchmark)
# Unit: seconds
#      expr      min       lq   median       uq      max neval
#  funAve() 6.727557 6.792743 6.827117 6.992609 7.352666    20
#   funDT() 1.967795 2.029697 2.053886 2.070462 2.123531    20

这篇关于从 R 中的向量创建频率计数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

With