本文介绍了使用dplyr和tbrf滚动几何方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用dplyr和tbrf为每个类型"组计算90天的滚动几何平均值和第90个百分位数.下面的代码为每个日期(而不是每90天)生成百分位.它还错误地生成了重复的行.

I'd like to use dplyr and tbrf to calculate a 90 day rolling geometric mean and 90th percentile for each group 'Type'. The code below is generating percentiles for each date, not every 90 days. It also is wrongly generating duplicate rows.

旁注:我首先尝试使用%within%并在lubridate包中创建一个间隔.但是,dplyr目前不支持lubridate的Interval类,因此想尝试tbrf.我也尝试过tibbletime,RcppRoll和Zoo的Rollapply

side note:I first tried using %within% and creating an interval in the lubridate package.However, class Interval from lubridate is currently not supported in dplyr hence wanting to try tbrf. I have also tried tibbletime,RcppRoll and zoo's Rollapply

##sample data###

Value=c(50,900,25,25,125,50,25,25,2000,25,25,
25,25,25,25,25,25,325,25,300,475,25)
Dates = as.Date(c("2015-02-23","2015-04-20","2015-06-17",
"2015-08-20","2015-10-05","2015-12-22",
"2016-01-19","2016-03-29","2016-05-03",
"2016-07-21","2016-09-08","2016-11-07",
"2017-02-27","2017-04-19","2017-06-29",
"2017-08-24","2017-10-23","2017-12-28",
"2018-01-16","2018-03-14","2018-05-29",
"2018-07-24"))
Type = c(rep("A", 11), rep("B", 11))

df=data.frame(Value,Dates,Type)



######failed attempt 1####
df2=df %>% group_by(Type) %>% 
 tbr_gmean(Value, Dates, "days", 90) %>%
  tbr_misc(Value, Dates,"days",  90, quantile, .(0.9)) 

 ## failed attempt #2##
 start.date = min(df$Dates)
 breaks = seq(start.date - 30*3600*24, start.date + 30*3600*24, "90 days")
 df$group = cut(df$Dates, breaks=breaks)
 DF= df %>% group_by(Type,group) %>% 
 mutate(Count=n(),gm=geoMean(Value),
 percentile_90=quantile(Value,0.90))

推荐答案

尝试一下:

library(psych)
library(dplyr)
library(zoo)
dfmod<-df %>% 
    group_by(Type) %>% 
      arrange(Dates) %>%
        mutate(rnk = cumsum(c(TRUE, diff(Dates) > 5)))%>% #changed it from !=1 to reflect that you want the date difference to be within 5 days or less
          group_by(Type,rnk) %>% 
            mutate(GM = rollapply(Value, 2, geometric.mean, fill=NA, align="right"),
                    qt=rollapply(Value, 2, quantile, p=0.90, fill=NA, align="right")) #changed 5 to 2 so that the rolling sum is calculated for every 2 rows 
 head(dfmod)
## A tibble: 6 x 6
## Groups:   Type, rnk [1]
#  Dates      Type  Value   rnk    GM    qt
#  <date>     <fct> <dbl> <int> <dbl> <dbl>
#1 2018-10-03 A      35.3     1  NA    NA  
#2 2018-10-04 A      34.3     1  NA    NA  
#3 2018-10-05 A      34.6     1  NA    NA  
#4 2018-10-06 A      34.3     1  NA    NA  
#5 2018-10-07 A      34.1     1  34.5  35.1
#6 2018-10-08 A      34.7     1  34.4  34.6

这篇关于使用dplyr和tbrf滚动几何方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-31 10:41