在R中具有相同符号的连续数字的每个范围中分配一个

在R中具有相同符号的连续数字的每个范围中分配一个

本文介绍了在R中具有相同符号的连续数字的每个范围中分配一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个数据框,其中存在一列,该列保存表示正数和负数的游程长度的值,如下所示:

I'm trying to create a data frame where a column exists that holds values representing the length of runs of positive and negative numbers, like so:

Time  V  Length
0.5  -2  1.5
1.0  -1  1.5
1.5   0  0.0
2.0   2  1.0
2.5   0  0.0
3.0   1  1.75
3.5   2  1.75
4.0   1  1.75
4.5  -1  0.75
5.0  -3  0.75

Length 列将值正或负的时间长度相加.因为零是拐点,所以零被赋予 0 .如果没有零分隔符号变化,则在拐点的任一侧取平均值.

The Length column sums the length of time that the value has been positive or negative. Zeros are given a 0 since they are an inflection point. If there is no zero separating the sign change, the values are averaged on either side of the inflection.

我正在尝试估算这些值花费正数或负数的时间.我已经使用 for 循环尝试了这种方式,并获得了不同程度的成功,但是我要避免循环,因为我正在处理非常大的数据集.

I am trying to approximate the amount of time that these values are spending either positive or negative. I've tried this with a for loop with varying degrees of success, but I would like to avoid looping because I am working with extremely large data sets.

我花了一些时间查看 sign diff ,因为它们用于.我也看过,并且使用 transform aggregate 对连续的重复值求和.我觉得我可以将其与 sign 和/或 diff 结合使用,但是我不确定如何追溯将这些总和分配给创建它们的范围,或者如何处理我在拐弯处取平均值的点.

I've spent some time looking at sign and diff as they are used in this question about sign changes. I've also looked at this question that uses transform and aggregate to sum consecutive duplicate values. I feel like I could use this in combination with sign and/or diff, but I'm not sure how to retroactively assign these sums to the ranges that created them or how to deal with spots where I'm taking the average across the inflection.

任何建议将不胜感激.这是示例数据集:

Any suggestions would be appreciated. Here is the sample dataset:

dat <- data.frame(Time = seq(0.5, 5, 0.5), V = c(-2, -1, 0, 2, 0, 1, 2, 1, -1, -3))

推荐答案

首先找到需要内插的时间"索引:连续的"V",其正负之间不为零;他们的 abs(diff(sign(V))等于2.

First find indices of "Time" which need to be interpolated: consecutive "V" which lack a zero between positive and negative values; they have an abs(diff(sign(V)) equal to two.

id <- which(abs(c(0, diff(sign(dat$V)))) == 2)

将相关索引之间的平均时间"和对应的"V"值为零的行添加到原始数据.还应在时间" = 0以及最后一个时间步长添加"V" = 0的行(根据@Gregor提到的假设).按时间"排序.

Add rows with average "Time" between relevant indices and corresponding "V" values of zero to the original data. Also add rows of "V" = 0 at "Time" = 0 and at last time step (according to the assumptions mentioned by @Gregor). Order by "Time".

d2 <- rbind(dat,
            data.frame(Time = (dat$Time[id] + dat$Time[id - 1])/2, V = 0),
            data.frame(Time = c(0, max(dat$Time)), V = c(0, 0))
            )
d2 <- d2[order(d2$Time), ]

计算零个时间步之间的时差,并使用零组索引"进行复制.

Calculate time differences between time steps which are zero and replicate them using "zero-group indices".

d2$Length <- diff(d2$Time[d2$V == 0])[cumsum(d2$V == 0)]

为原始数据添加值:

merge(dat, d2)

#    Time  V Length
# 1   0.5 -2   1.50
# 2   1.0 -1   1.50
# 3   1.5  0   1.00
# 4   2.0  2   1.00
# 5   2.5  0   1.75
# 6   3.0  1   1.75
# 7   3.5  2   1.75
# 8   4.0  1   1.75
# 9   4.5 -1   0.75
# 10  5.0 -3   0.75

将长度"设置为 0 ,其中 V == 0 .

Set "Length" to 0 where V == 0.

这篇关于在R中具有相同符号的连续数字的每个范围中分配一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!