本文介绍了对于data.frame中的每一行,获取R中前n个值的标准偏差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在努力获取前n个值的标准偏差。或者我最近5天。
I am struggling to get the standard deviation of the previous n values. Or in my case the last 5 days.
我以下面的代码为例:
df<- data.frame(date = seq(as.Date("2019-12-01"), as.Date("2020-03-31"), by="days"),
TRM= runif(122, min=3500, max=4100))
> df
date TRM
1 2019-12-01 3540.028
2 2019-12-02 3673.536
3 2019-12-03 3827.182
4 2019-12-04 3824.791
5 2019-12-05 3906.753
6 2019-12-06 3528.100
7 2019-12-07 3650.191
# ... with more rows
然后我使用 mutate
添加一些我需要的信息,将显示最后一行:
Then I use mutate
to add some information that I need, I will show you the last rows:
df<-mutate(df, diferencia = TRM - lag(TRM, 1),
VAR=diferencia/lag(TRM, 1))
>df
date TRM diferencia VAR
118 2020-03-27 3779.479 -262.366328 -0.064912515
119 2020-03-28 3773.771 -5.708207 -0.001510316
120 2020-03-29 4097.078 323.307069 0.085672159
121 2020-03-30 3752.619 -344.459061 -0.084074332
122 2020-03-31 3707.442 -45.176979 -0.012038788
所以我需要的是以下内容:
So what I need is the following:
- 创建具有
sd
的列 - 每行的
sd
必须仅包含 VAR列的最后5天。 - 如果所有这些都可以通过
dply
完成,那就太好了。 (不是必需的)
- Create a column that have the
sd
for the column "VAR". - That the
sd
for each row must contain only the last 5 days of the column "VAR". - If all this could be done with
dply
, would be great. (Not necessary)
例如,对于第122行,结果为:
For example, for the row 122 the result would be this:
> sd(df[118:122,4])
[1] 0.06630885
那又怎样我要得到的是 df
的所有行的此值,我以5天为例,但是我想修改范围:
So what I what to get is this value for all the rows of my df
, I used 5 days as an example but I would like to modify the range:
date TRM diferencia VAR diff5days
118 2020-03-27 3779.479 -262.366328 -0.064912515 0.05801765
119 2020-03-28 3773.771 -5.708207 -0.001510316 0.04799908
120 2020-03-29 4097.078 323.307069 0.085672159 0.06207932
121 2020-03-30 3752.619 -344.459061 -0.084074332 0.07522609
122 2020-03-31 3707.442 -45.176979 -0.012038788 0.06630885
谢谢!
推荐答案
这里是使用Base R的解决方案:
Here is a solution using Base R:
df<- data.frame(date = seq(as.Date("2019-12-01"), as.Date("2020-03-31"), by="days"),
TRM= runif(122, min=3500, max=4100))
df$stDev <- NA
for(i in 5:nrow(df)) df$stDev[i] <- sd(df$TRM[(i - 4):i])
...以及输出:
> head(df,n = 10)
date TRM rownum stDev
1 2019-12-01 3553.666 1 NA
2 2019-12-02 4054.015 2 NA
3 2019-12-03 3976.555 3 NA
4 2019-12-04 3825.628 4 NA
5 2019-12-05 4036.383 5 208.01581
6 2019-12-06 3787.414 6 122.38142
7 2019-12-07 3886.663 7 103.45743
8 2019-12-08 3930.801 8 97.10099
9 2019-12-09 3626.911 9 155.10571
10 2019-12-10 3781.731 10 117.29726
>
我们可以验证前三行的结果,如下所示:
We can verify the results for the first three rows as follows:
> # verify first three results
> sd(df$TRM[1:5])
[1] 208.0158
> sd(df$TRM[2:6])
[1] 122.3814
> sd(df$TRM[3:7])
[1] 103.4574
>
这篇关于对于data.frame中的每一行,获取R中前n个值的标准偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!