我有2列,HHVEH和SAMPN组。 SAMPN的所有成员都具有相同的HHVEH。我想定义一个新列,直到HHVEH的数量为2。
例
SAMPN PERNO HHVEH
1 1 1
1 2 1
1 3 1
2 1 2
3 2 2
3 3 2
4 4 0
4 3 0
输出
SAMPN PERNO HHVEH mode.car
1 1 1 2
1 2 1 NA
1 3 1 NA
2 1 2 2
3 2 2 2
3 3 2 2
4 4 0 NA
4 3 0 NA
解释:第一组HHVEH == 1,所以第一行是2,其他行是NA。第二组HHVEH == 2所以它的前2个子行应该是2,但是只有一行,所以行是2。第三组HHVEH == 2都得到2最后一组HHVEH == 0,所以所有的NA。
structure(list(SAMPN = c(" 827", " 827", " 1133", " 1133",
" 1133", " 1133", " 1133", " 1133", " 1857", " 1857", " 1857"
), HHVEH = c(3, 3, 2, 2, 2, 2, 2, 2, 3, 3, 3), PERNO = structure(c(2L,
4L, 4L, 3L, 3L, 5L, 1L, 1L, 3L, 2L, 3L), .Label = c("1", "2",
"3", "4", "5", "6", "7"), class = "factor")), row.names = c(NA,
-11L), groups = structure(list(SAMPN = c(" 827", " 1133", " 1857"
), .rows = list(1:2, 3:8, 9:11)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
SAMPN HHVEH PERNO mode.car
<chr> <dbl> <fct>
1 " 827" 3 2 2
2 " 827" 3 4 2
3 " 1133" 2 4 2
4 " 1133" 2 3 2
5 " 1133" 2 3 NA
6 " 1133" 2 5 NA
7 " 1133" 2 1 NA
8 " 1133" 2 1 NA
9 " 1857" 3 3 2
10 " 1857" 3 2 2
11 " 1857" 3 3 2
最佳答案
这是一个基于更新数据的选项。在按“SAMPN”分组后,通过rep
将“2”(基于“HHVEH”中的first
值)和其余的与NA
进行对接来创建“mode.car”
library(dplyr)
df1 %>%
group_by(SAMPN) %>%
mutate(mode.car = rep(c(2, NA_integer_),
c(pmin(n(), first(HHVEH)), pmax(0, n() - first(HHVEH)))))
# A tibble: 11 x 4
# Groups: SAMPN [3]
# SAMPN HHVEH PERNO mode.car
# <chr> <dbl> <fct> <dbl>
# 1 " 827" 3 2 2
# 2 " 827" 3 4 2
# 3 " 1133" 2 4 2
# 4 " 1133" 2 3 2
# 5 " 1133" 2 3 NA
# 6 " 1133" 2 5 NA
# 7 " 1133" 2 1 NA
# 8 " 1133" 2 1 NA
# 9 " 1857" 3 3 2
#10 " 1857" 3 2 2
#11 " 1857" 3 3 2
关于r - 定义一个新的变量来计算带有另一列的组元素,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58050914/