问题描述
我试图在 R 中找到一种正确的方法来查找重复值,并将值 1 添加到按 id 分组的每个后续重复值.例如:
I am trying to find a proper way, in R, to find duplicated values, and add the value 1 to each subsequent duplicated value grouped by id. For example:
data = data.table(id = c('1','1','1','1','1','2','2','2'),
value = c(95,100,101,101,101,20,35,38))
data$new_value <- ifelse(data[ , data$value] == lag(data$value,1),
lag(data$value, 1) + 1 ,data$value)
data$desired_value <- c(95,100,101,102,103,20,35,38)
生产:
id value new_value desired_value
1: 1 95 NA 95
2: 1 100 100 100
3: 1 101 101 101 # first 101 in id 1: add 0
4: 1 101 102 102 # second 101 in id 1: add 1
5: 1 101 102 103 # third 101 in id 1: add 2
6: 2 20 20 20
7: 2 35 35 35
8: 2 38 38 38
我尝试使用 ifelse
执行此操作,但它不能递归工作,因此它仅适用于下一行,而不适用于任何后续行.lag
函数也会导致我丢失 value
中的第一个值.
I tried doing this with ifelse
, but it doesn't work recursively so it only applies to the following row, and not any subsequent rows. Also the lag
function results in me losing the first value in value
.
我见过带有 make.names
或 make.unique
的字符变量的示例,但无法找到重复数值的解决方案.
I've seen examples with character variables with make.names
or make.unique
, but haven't been able to find a solution for a duplicated numeric value.
背景:我正在进行生存分析,我发现我的数据有相同的停止时间,因此我需要通过添加 1 使其唯一(停止时间以秒为单位).
Background: I am doing a survival analysis and I am finding that with my data there are stop times that are the same, so I need to make it unique by adding a 1 (stop times are in seconds).
推荐答案
这是一个尝试.您实际上是按 id
和 value
分组并添加 0:(length(value)-1)
.所以:
Here's an attempt. You're essentially grouping by id
and value
and adding 0:(length(value)-1)
. So:
data[, onemore := value + (0:(.N-1)), by=.(id, value)]
# id value new_value desired_value onemore
#1: 1 95 96 95 95
#2: 1 100 101 100 100
#3: 1 101 102 101 101
#4: 1 101 102 102 102
#5: 1 101 102 103 103
#6: 2 20 21 20 20
#7: 2 35 36 35 35
#8: 2 38 39 38 38
这篇关于每个重复值加一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!