本文介绍了在R中的某些观察之前选择组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data=structure(list(x1 = c(88L, 88L, 94L, 82L, 68L, 72L, 43L, 84L, 
65L, 91L, 65L, 80L, 82L, 63L, 67L, 58L, 100L, 32L, 75L, 66L, 
30L, 12L, 97L, 58L, 14L, 64L), group = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", "male"), class = "factor")), .Names = c("x1", 
"group"), class = "data.frame", row.names = c(NA, -26L))

在此数据中有组变量(性别(男性和女性)
,我需要统计平均值,所有在女性之前的男性都需要25%的百分比。在女性之后,我不接触的男性。女性
所以输出

In this data there is group variable (sex (male and female)I need get statistics mean and 25 percentile for ALL male which go before female. Male which after female, i don't touch. Also female i don't touch.So as output

x1  group   mean    25%
88  male    76,36   66,5
88  male    76,36   66,5
94  male    76,36   66,5
82  male    76,36   66,5
68  male    76,36   66,5
72  male    76,36   66,5
43  male    76,36   66,5
84  male    76,36   66,5
65  male    76,36   66,5
91  male    76,36   66,5
65  male    76,36   66,5
80  female      
82  female      
63  female      
67  female      
58  female      
100 female      
32  female      
75  male        
66  male        
30  male        
12  male        
97  male        
58  male        
14  male        
64  male        

该怎么做?

x1  group
88  male
88  male
94  male
82  male
68  male
72  male
43  male
84  male
65  male
91  male
65  male
80  female
82  female
63  female
67  female
58  female
100 female
32  female
**76,36 male
**76,36 male
30  male
12  male
**76,36 male
58  male
14  male
64  male

这里结果。

推荐答案

library(dplyr)
library(data.table)

data %>%
  group_by(group, group2 = rleid(group)) %>%                       # group by gender and it's position
  mutate(MEAN = mean(x1[group=="male" & group2==1]),               # calculate metrics only for male in position 1
         Q25 = quantile(x1[group=="male" & group2==1], 0.25)) %>%
  ungroup() %>%                                                    # ungroup
  select(-group2) %>%                                              # remove column
  data.frame()                                                     # only for visualisation purposes

#     x1  group     MEAN  Q25
# 1   88   male 76.36364 66.5
# 2   88   male 76.36364 66.5
# 3   94   male 76.36364 66.5
# 4   82   male 76.36364 66.5
# 5   68   male 76.36364 66.5
# 6   72   male 76.36364 66.5
# 7   43   male 76.36364 66.5
# 8   84   male 76.36364 66.5
# 9   65   male 76.36364 66.5
# 10  91   male 76.36364 66.5
# 11  65   male 76.36364 66.5
# 12  80 female      NaN   NA
# 13  82 female      NaN   NA
# 14  63 female      NaN   NA
# 15  67 female      NaN   NA
# 16  58 female      NaN   NA
# 17 100 female      NaN   NA
# 18  32 female      NaN   NA
# 19  75   male      NaN   NA
# 20  66   male      NaN   NA
# 21  30   male      NaN   NA
# 22  12   male      NaN   NA
# 23  97   male      NaN   NA
# 24  58   male      NaN   NA
# 25  14   male      NaN   NA
# 26  64   male      NaN   NA

用于更新 x1 列根据您提到的逻辑可以使用:

For updating x1 column according to the logic you mentioned you can use this:

data %>%
  group_by(group, group2 = rleid(group)) %>%                       
  mutate(MEAN = mean(x1[group=="male" & group2==1]),               
         Q25 = quantile(x1[group=="male" & group2==1], 0.25)) %>%
  ungroup() %>%
  mutate(x1 = ifelse(group=="male" & group2==3 & x1 > unique(Q25[!is.na(Q25)]), unique(MEAN[!is.na(MEAN)]), x1)) %>%
  ungroup() %>%
  select(-group2) %>%
  data.frame()

#     x1  group     MEAN  Q25
# 1   88.00000   male 76.36364 66.5
# 2   88.00000   male 76.36364 66.5
# 3   94.00000   male 76.36364 66.5
# 4   82.00000   male 76.36364 66.5
# 5   68.00000   male 76.36364 66.5
# 6   72.00000   male 76.36364 66.5
# 7   43.00000   male 76.36364 66.5
# 8   84.00000   male 76.36364 66.5
# 9   65.00000   male 76.36364 66.5
# 10  91.00000   male 76.36364 66.5
# 11  65.00000   male 76.36364 66.5
# 12  80.00000 female      NaN   NA
# 13  82.00000 female      NaN   NA
# 14  63.00000 female      NaN   NA
# 15  67.00000 female      NaN   NA
# 16  58.00000 female      NaN   NA
# 17 100.00000 female      NaN   NA
# 18  32.00000 female      NaN   NA
# 19  76.36364   male      NaN   NA
# 20  66.00000   male      NaN   NA
# 21  30.00000   male      NaN   NA
# 22  12.00000   male      NaN   NA
# 23  76.36364   male      NaN   NA
# 24  58.00000   male      NaN   NA
# 25  14.00000   male      NaN   NA
# 26  64.00000   male      NaN   NA

我添加的额外代码( mutate )仅在男性之后(仅次于女性)更新 x1 group2 = 3'),且仅当 x1`大于分位数值时。

The extra piece of code I added (mutate) updates x1 only for males after females (i.e. group2 = 3') and only ifx1` is bigger than the quantile value.

这篇关于在R中的某些观察之前选择组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-16 13:19