Closed. This question needs to be more focused。它当前不接受答案。
                            
                        
                    
                
                            
                                
                
                        
                            
                        
                    
                        
                            想改善这个问题吗?更新问题,使其仅通过editing this post专注于一个问题。
                        
                        15天前关闭。
                                                                                            
                
        
这是我的df(data.frame):

group1  group2   value
chr1     a        1
chr1     a        1
chr1     a        1
chr1     b        2.2
chr1     b        2.5
chr1     b        2.5
chr1     b        2.8
chr2     c        3.1
chr2     c        -3.2
chr2     c        -3.7
chr2     c        -3.1
chr2     d        4


对于属于相同的group2和group1的“值”列中的值,如果有3个以上的连续值大于2或小于-2,则将计算这些值的平均值,否则将保留原始值。

输出应为:

group1  group2   value      mean
chr1     a        1          1 # does not change because it's smaller than 2
chr1     a        1          1
chr1     a        1          1
chr1     b        2.2        2.5 # mean of 2.2, 2.5, 2.5, 2.8
chr1     b        2.5        2.5
chr1     b        2.5        2.5
chr1     b        2.8        2.5
chr2     c        3.1        3.1 # not used for mean calculation above (different group)
chr2     c        -3.2       -3.3 # mean of -3.2, -3.7, -3.1
chr2     c        -3.7       -3.3
chr2     c        -3.1       -3.3
chr2     d        4          4


任何帮助表示赞赏。

最佳答案

使用末尾“注释”中可重复显示的DF,使用data.table中的rleid创建分组变量。否则不使用data.table。然后创建一个使用问题规则的均值函数。最后,对value的每个组成部分将Mean应用于g

library(data.table)
g <- with(DF, rleid((value > 2) - (value < 2), group1, group2))
Mean <- function(x) if ((all(x > 2) || all(x < -2)) && length(x) >= 3) mean(x) else x
transform(DF, value2 = ave(value, g, FUN = Mean))


给予:

   group1 group2 value    value2
1    chr1      a   1.0  1.000000
2    chr1      a   1.0  1.000000
3    chr1      a   1.0  1.000000
4    chr1      b   2.2  2.500000
5    chr1      b   2.5  2.500000
6    chr1      b   2.5  2.500000
7    chr1      b   2.8  2.500000
8    chr2      c   3.1  3.100000
9    chr2      c  -3.2 -3.333333
10   chr2      c  -3.7 -3.333333
11   chr2      c  -3.1 -3.333333
12   chr2      d   4.0  4.000000


注意

Lines <- "group1  group2   value
chr1     a        1
chr1     a        1
chr1     a        1
chr1     b        2.2
chr1     b        2.5
chr1     b        2.5
chr1     b        2.8
chr2     c        3.1
chr2     c        -3.2
chr2     c        -3.7
chr2     c        -3.1
chr2     d        4"
DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE)

关于r - 值大于x的连续行的“平均”(按组),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60398840/

10-12 19:59