问题描述
我有以下格式的数据称为DF(这只是一个简化的示例):
I have data in the following format called DF (this is just a made up simplified sample):
eval.num, eval.count, fitness, fitness.mean, green.h.0, green.v.0, offset.0 random
1 1 1500 1500 100 120 40 232342
2 2 1000 1250 100 120 40 11843
3 3 1250 1250 100 120 40 981340234
4 4 1000 1187.5 100 120 40 4363453
5 1 2000 2000 200 100 40 345902
6 1 3000 3000 150 90 10 943
7 1 2000 2000 90 90 100 9304358
8 2 1800 1900 90 90 100 284333
但是,eval.count列不正确,我需要修复它。它应该通过只查看以前的行来报告具有相同值(green.h.0,green.v.0和offset.0)的行数。
However, the eval.count column is incorrect and I need to fix it. It should report the number of rows with the same values for (green.h.0, green.v.0, and offset.0) by only looking at the previous rows.
上面的例子使用期望的值,但假设它们不正确。
The example above uses the expected values, but assume they are incorrect.
如何添加一个新列(说count具有相同的指定变量值的行?
How can I add a new column (say "count") which will count all previous rows which have the same values of the specified variables?
我已经得到了帮助类似的问题,只是选择所有行具有相同的值指定的列,所以我应该
I have gotten help on a similar problem of just selecting all rows with the same values for specified columns, so I supposed I could just write a loop around that, but it seems inefficient to me.
推荐答案
确定, 让我们先在你只有一列的简单情况下做它。
Ok, let's first do it in the easy case where you just have one column.
> data <- rep(sample(1000, 5),
sample(5, 5))
> head(data)
[1] 435 435 435 278 278 278
使用rle来计算出连续的序列:
Then you can just use rle to figure out the contiguous sequences:
> sequence(rle(data)$lengths)
[1] 1 2 3 1 2 3 4 5 1 2 3 4 1 2 1
或完全:
> head(cbind(data, sequence(rle(data)$lengths)))
[1,] 435 1
[2,] 435 2
[3,] 435 3
[4,] 278 1
[5,] 278 2
[6,] 278 3
b $ b
对于具有多个列的情况,可能有很多种方法来应用此解决方案。最简单的可能是粘贴
您关心的列以形成一个单一的向量。
For your case with multiple columns, there are probably a bunch of ways of applying this solution. Easiest might be to just paste
the columns you care about together to form a single vector.
这篇关于R计数相似行的数据帧的出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!