问题描述
说,我有数据集
mydat=structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "25481МСК", class = "factor"),
item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L,
13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L,
13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L,
13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L,
13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L,
13164L, 13164L, 13164L, 13164L, 13164L, 13164L), sales = c(4L,
1L, 10L, 6L, 8L, 3L, 11L, 6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L,
4L, 15L, 10L, 6L, 6L, 5L, 4L, 4L, 1L, 10L, 6L, 8L, 3L, 11L,
6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L, 4L, 15L, 10L, 6L, 6L, 5L,
4L), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 0L)), .Names = c("code", "item", "sales", "action"
), class = "data.frame", row.names = c(NA, -44L))
我有2组vars代码和项目.这是两组:
I have 2 groups vars code+item. Here two groups:
25481МСК 13163
25480МСК 13164
我也有行动专栏.它只能有两个值零(0)或一(1). 我需要按操作列按三个前面的零类别来计算中值,即按一列操作列来计算中位数,按操作列按三个零来计算该类别后的中位数.
Also i have action column. It can have only two values zero(0) or one(1). I need to calculate the median by three preceding zeros category by action column, i.e. which go before one category of action column, and by three zeros by action column that go after the one category.
这里有个例子
sales action output
2 0 2
4 0 4
3 0 3
10 1 **5**
4 1 **5**
15 1 **5**
10 0 10
6 0 6
6 0 6
median =(2,4,3),(10,6,6)= 5
所以在1之前和之后= 5的中位数按零分类,然后以该中位数代替行动中的人(1).即这些零内的一个类别.因为从示例中可以看出,零内还有其他一个,因此必须对它们应用相同的原理.但是,如果中位数大于销售额,则不要替换.
median=(2,4,3),(10,6,6)=5
so median by zeros category before one and after one =5,then replace ones(1) by action by this median. i.e. the one category that is inside these zeros. Because, as can be seen from the example, there are other ones inside zeros.The same principle must be applied to them.BUT, if median is more than the sales, then do not replace it.
I.E.应该吧
sales action
10 1
5 1
14 1
并且零位的中位数为12,因此在这种情况下输出将为
and median by zero is 12, so in this case output would be
output
10
5
12
仅需替换14个,导致位数超过中位数.
only 14 must be replaced, cause it more then median.
在真实情况下
sales action output
2 0 2
4 0 4
3 0 3
10 1 **5**
4 1 **4**
15 1 **5**
10 0 10
6 0 6
6 0 6
应该对每个组分别进行.
25481МСК 13163
25480МСК 13164
所需的输出
code item sales action output
1 25481МСК 13163 4 0 4
2 25481МСК 13163 1 0 1
3 25481МСК 13163 10 0 10
4 25481МСК 13163 6 0 6
5 25481МСК 13163 8 0 8
6 25481МСК 13163 3 0 3
7 25481МСК 13163 11 0 11
8 25481МСК 13163 6 0 6
9 25481МСК 13163 4 0 4
10 25481МСК 13163 2 0 2
11 25481МСК 13163 4 0 4
12 25481МСК 13163 2 0 2
13 25481МСК 13163 4 0 4
14 25481МСК 13163 3 0 3
15 25481МСК 13163 10 1 5
16 25481МСК 13163 4 1 5
17 25481МСК 13163 15 1 5
18 25481МСК 13163 10 0 10
19 25481МСК 13163 6 0 6
20 25481МСК 13163 6 0 6
21 25481МСК 13163 5 0 5
22 25481МСК 13163 4 0 4
23 25481МСК 13164 4 0 4
24 25481МСК 13164 1 0 1
25 25481МСК 13164 10 0 10
26 25481МСК 13164 6 0 6
27 25481МСК 13164 8 0 8
28 25481МСК 13164 3 0 3
29 25481МСК 13164 11 0 11
30 25481МСК 13164 6 0 6
31 25481МСК 13164 4 0 4
32 25481МСК 13164 2 0 2
33 25481МСК 13164 4 0 4
34 25481МСК 13164 2 0 2
35 25481МСК 13164 4 0 4
36 25481МСК 13164 3 0 3
37 25481МСК 13164 10 1 5
38 25481МСК 13164 4 1 5
39 25481МСК 13164 15 1 5
40 25481МСК 13164 10 0 10
41 25481МСК 13164 6 0 6
42 25481МСК 13164 6 0 6
43 25481МСК 13164 5 0 5
44 25481МСК 13164 4 0 4
请注意,action = 0的sales列的值也应该在输出列中.效果如何?
Note that value of sales column for action=0 also should be in the output column. How perform it?
P.S.请不要注意,该产出中位数多于销售额.这只是测试.
P.S. Please, do not pay attention to that there are medians in this output that more then sales. It's just test.
code item sales action output
52382МСК 11709 1 0 1
52382МСК 11709 10 1 NA
52382МСК 11709 1 0 1
52382МСК 11709 3 0 3
推荐答案
我认为这接近解决方案? (说实话,我不确定我是否完全理解这个问题)
I think this gets near a solution? (to be honest, I'm not sure I fully understand the question)
library(dplyr)
replacements <-
data_frame(
action1 = which(mydat$action == 1L),
group = rep(1:length(action1), each = 3, length.out = length(action1)),
sales1 = mydat$sales[action1],
sales_before = mydat$sales[action1 - 3L],
sales_after = mydat$sales[action1 + 3L]
) %>%
group_by(group) %>%
mutate(
med = median(c(sales_before, sales_after)),
output = pmin(sales1, med)
)
mydat$output <- mydat$sales
mydat$output[replacements$action1] <- replacements$output
mydat
这篇关于用R中的组分隔的三个零之前和三个之后的条件替换类别的中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!