r - 在R中的整数向量中组合/求和两个位置

我在R中有一个简单的整数向量。我想在向量中随机选择n个位置，然后在向量中“合并”它们（即和）。此过程可能会发生多次，即，在100个向量中，可能会发生5个合并/求和事件，每个事件中分别合并2、3、2、4和2个向量位置。例如：

#An example original vector of length 10:
ex.have<-c(1,1,30,16,2,2,2,1,1,9)

#For simplicity assume some process randomly combines the
#first two [1,1] and last three [1,1,9] positions in the vector.

ex.want<-c(2,30,16,2,2,2,11)

#Here, there were two merging events of 2 and 3 vector positions, respectively

#EDIT: the merged positions do not need to be consecutive.
#They could be randomly selected from any position.

但是此外，我还需要记录“合并”了多少个向量位置（如果未合并向量中的位置，则包括值1）-将它们称为索引。由于在上面的示例中前两个已合并，后三个已合并，因此索引数据如下所示：

ex.indices<-c(2,1,1,1,1,1,3)

最后，我需要将它们全部放入一个矩阵中，因此，上例中的最终数据将是一个两列矩阵，其中一列为整数，另一列为索引：

ex.final<-matrix(c(2,30,16,2,2,2,11,2,1,1,1,1,1,3),ncol=2,nrow=7)

目前，即使在最简单的步骤上，我也在寻求帮助：组合向量中的位置。我已经尝试过sample和split函数的多种变体，但是却陷入了死胡同。例如，sum(sample(ex.have,2))将对两个随机选择的位置求和（或者sum(sample(ex.have,rpois(1,2))将在n个值中添加一些随机性），但是我不确定如何利用它来实现所需的数据集。详尽的搜索导致出现多篇有关组合向量的文章，但没有关于向量中位置的文章，因此，如果这是重复的，我深表歉意。任何有关如何解决这些问题的建议将不胜感激。

最佳答案

这是我设计用来执行您描述的任务的功能。

vec_merge函数采用以下参数：

x：整数向量。

event_perc：事件的百分比。这是一个介于0到1之间的数字（尽管1可能太大）。事件数的计算方式是x的长度乘以event_perc。

sample_n：合并样本编号。这是一个整数向量，所有数字均大于或等于2。

vec_merge <- function(x, event_perc = 0.2, sample_n = c(2, 3)){
  # Check if event_perc makes sense
  if (event_perc > 1 | event_perc <= 0){
    stop("event_perc should be between 0 to 1.")
  }
  # Check if sample_n makes sense
  if (any(sample_n < 2)){
    stop("sample_n should be at least larger than 2")
  }
  # Determine the event numbers
  n <- round(length(x) * event_perc)
  # Determine the sample number of each event
  sample_vec <- sample(sample_n, size = n, replace = TRUE)
  names(sample_vec) <- paste0("S", 1:n)
  # Check if the sum of sample_vec is larger than the length of x
  # If yes, stop the function and print a message
  if (length(x) < sum(sample_vec)){
    stop("Too many samples. Decrease event_perc or sampel_n")
  }
  # Determine the number that will not be merged
  n2 <- length(x) - sum(sample_vec)
  # Create a vector with replicated 1 based on m
  non_merge_vec <- rep(1, n2)
  names(non_merge_vec) <- paste0("N", 1:n2)
  # Combine sample_vec and non_merge_vec, and then randomly sorted the vector
  combine_vec <- c(sample_vec, non_merge_vec)
  combine_vec2 <- sample(combine_vec, size = length(combine_vec))
  # Expand the vector
  expand_list <- list(lengths = combine_vec2, values = names(combine_vec2))
  expand_vec <- inverse.rle(expand_list)
  # Create a data frame with x and expand_vec
  dat <- data.frame(number = x,
                    group = factor(expand_vec, levels = unique(expand_vec)))
  dat$index <- 1
  dat2 <- aggregate(cbind(dat$number, dat$index),
                    by = list(group = dat$group),
                    FUN = sum)
  # # Convert dat2 to a matrix, remove the group column
  dat2$group <- NULL
  mat <- as.matrix(dat2)
  return(mat)
}

这是对该功能的测试。我将函数应用于从1到10的序列。如您所见，在此示例中，4和5被合并，并且8和9也被合并。

set.seed(123)
vec_merge(1:10)
#      number index
# [1,]      1     1
# [2,]      2     1
# [3,]      3     1
# [4,]      9     2
# [5,]      6     1
# [6,]      7     1
# [7,]     17     2
# [8,]     10     1