我有2列,HHVEH和SAMPN组。 SAMPN的所有成员都具有相同的HHVEH。我想定义一个新列,直到HHVEH的数量为2。


          SAMPN      PERNO HHVEH
            1          1     1
            1          2     1
            1          3     1
            2          1     2
            3          2     2
            3          3     2
            4          4     0
            4          3     0

输出
          SAMPN      PERNO HHVEH      mode.car
            1          1     1           2
            1          2     1           NA
            1          3     1           NA
            2          1     2           2
            3          2     2           2
            3          3     2           2
            4          4     0          NA
            4          3     0          NA

解释:第一组HHVEH == 1,所以第一行是2,其他行是NA。第二组HHVEH == 2所以它的前2个子行应该是2,但是只有一行,所以行是2。第三组HHVEH == 2都得到2最后一组HHVEH == 0,所以所有的NA。
structure(list(SAMPN = c("  827", "  827", " 1133", " 1133",
" 1133", " 1133", " 1133", " 1133", " 1857", " 1857", " 1857"
), HHVEH = c(3, 3, 2, 2, 2, 2, 2, 2, 3, 3, 3), PERNO = structure(c(2L,
4L, 4L, 3L, 3L, 5L, 1L, 1L, 3L, 2L, 3L), .Label = c("1", "2",
"3", "4", "5", "6", "7"), class = "factor")), row.names = c(NA,
-11L), groups = structure(list(SAMPN = c("  827", " 1133", " 1857"
), .rows = list(1:2, 3:8, 9:11)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))


  SAMPN   HHVEH PERNO      mode.car
   <chr>   <dbl> <fct>
 1 "  827"     3 2            2
 2 "  827"     3 4            2
 3 " 1133"     2 4            2
 4 " 1133"     2 3            2
 5 " 1133"     2 3            NA
 6 " 1133"     2 5            NA
 7 " 1133"     2 1            NA
 8 " 1133"     2 1            NA
 9 " 1857"     3 3            2
10 " 1857"     3 2            2
11 " 1857"     3 3            2

最佳答案

这是一个基于更新数据的选项。在按“SAMPN”分组后,通过rep将“2”(基于“HHVEH”中的first值)和其余的与NA进行对接来创建“mode.car”

library(dplyr)
df1 %>%
   group_by(SAMPN) %>%
   mutate(mode.car = rep(c(2, NA_integer_),
           c(pmin(n(), first(HHVEH)), pmax(0, n() - first(HHVEH)))))
# A tibble: 11 x 4
# Groups:   SAMPN [3]
#   SAMPN   HHVEH PERNO mode.car
#   <chr>   <dbl> <fct>    <dbl>
# 1 "  827"     3 2            2
# 2 "  827"     3 4            2
# 3 " 1133"     2 4            2
# 4 " 1133"     2 3            2
# 5 " 1133"     2 3           NA
# 6 " 1133"     2 5           NA
# 7 " 1133"     2 1           NA
# 8 " 1133"     2 1           NA
# 9 " 1857"     3 3            2
#10 " 1857"     3 2            2
#11 " 1857"     3 3            2

关于r - 定义一个新的变量来计算带有另一列的组元素,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58050914/

10-12 14:00