问题描述
我正在尝试创建一个数据框,其中存在一列,该列保存表示正数和负数的游程长度的值,如下所示:
I'm trying to create a data frame where a column exists that holds values representing the length of runs of positive and negative numbers, like so:
Time V Length
0.5 -2 1.5
1.0 -1 1.5
1.5 0 0.0
2.0 2 1.0
2.5 0 0.0
3.0 1 1.75
3.5 2 1.75
4.0 1 1.75
4.5 -1 0.75
5.0 -3 0.75
Length
列将值正或负的时间长度相加.因为零是拐点,所以零被赋予 0
.如果没有零分隔符号变化,则在拐点的任一侧取平均值.
The Length
column sums the length of time that the value has been positive or negative. Zeros are given a 0
since they are an inflection point. If there is no zero separating the sign change, the values are averaged on either side of the inflection.
我正在尝试估算这些值花费正数或负数的时间.我已经使用 for
循环尝试了这种方式,并获得了不同程度的成功,但是我要避免循环,因为我正在处理非常大的数据集.
I am trying to approximate the amount of time that these values are spending either positive or negative. I've tried this with a for
loop with varying degrees of success, but I would like to avoid looping because I am working with extremely large data sets.
我花了一些时间查看 sign
和 diff
,因为它们用于.我也看过,并且使用 transform
和 aggregate
对连续的重复值求和.我觉得我可以将其与 sign
和/或 diff
结合使用,但是我不确定如何追溯将这些总和分配给创建它们的范围,或者如何处理我在拐弯处取平均值的点.
I've spent some time looking at sign
and diff
as they are used in this question about sign changes. I've also looked at this question that uses transform
and aggregate
to sum consecutive duplicate values. I feel like I could use this in combination with sign
and/or diff
, but I'm not sure how to retroactively assign these sums to the ranges that created them or how to deal with spots where I'm taking the average across the inflection.
任何建议将不胜感激.这是示例数据集:
Any suggestions would be appreciated. Here is the sample dataset:
dat <- data.frame(Time = seq(0.5, 5, 0.5), V = c(-2, -1, 0, 2, 0, 1, 2, 1, -1, -3))
推荐答案
首先找到需要内插的时间"索引:连续的"V",其正负之间不为零;他们的 abs(diff(sign(V))
等于2.
First find indices of "Time" which need to be interpolated: consecutive "V" which lack a zero between positive and negative values; they have an abs(diff(sign(V))
equal to two.
id <- which(abs(c(0, diff(sign(dat$V)))) == 2)
将相关索引之间的平均时间"和对应的"V"值为零的行添加到原始数据.还应在时间" = 0以及最后一个时间步长添加"V" = 0的行(根据@Gregor提到的假设).按时间"排序.
Add rows with average "Time" between relevant indices and corresponding "V" values of zero to the original data. Also add rows of "V" = 0 at "Time" = 0 and at last time step (according to the assumptions mentioned by @Gregor). Order by "Time".
d2 <- rbind(dat,
data.frame(Time = (dat$Time[id] + dat$Time[id - 1])/2, V = 0),
data.frame(Time = c(0, max(dat$Time)), V = c(0, 0))
)
d2 <- d2[order(d2$Time), ]
计算零个时间步之间的时差,并使用零组索引"进行复制.
Calculate time differences between time steps which are zero and replicate them using "zero-group indices".
d2$Length <- diff(d2$Time[d2$V == 0])[cumsum(d2$V == 0)]
为原始数据添加值:
merge(dat, d2)
# Time V Length
# 1 0.5 -2 1.50
# 2 1.0 -1 1.50
# 3 1.5 0 1.00
# 4 2.0 2 1.00
# 5 2.5 0 1.75
# 6 3.0 1 1.75
# 7 3.5 2 1.75
# 8 4.0 1 1.75
# 9 4.5 -1 0.75
# 10 5.0 -3 0.75
将长度"设置为 0
,其中 V == 0
.
Set "Length" to 0
where V == 0
.
这篇关于在R中具有相同符号的连续数字的每个范围中分配一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!