假设我有这样的数据框:
family relationship meanings edu
1 1 A respondent 12
2 1 B respondent's spouse 18
3 1 C A's father 10
4 1 D A's mother 9
5 1 E1 A's first son 15
6 1 F1 E1's spouse 14
7 1 G11 E1's first son 3
8 1 G12 E1's second son 1
9 1 E2 A's second son 13
10 2 A respondent 21
11 2 B respondent's spouse 16
12 2 C A's father 12
13 2 D A's mother 16
14 2 E1 A's first son 18
15 2 F1 E1's spouse 15
16 2 E2 A's second son 17
17 2 E3 A's third son 16
family
表示家庭编号。 relationship
表示一个家庭中的关系。 meanings
表示第二列relationship
的含义。我想计算一个家庭中父亲这一代的最高学历。
我们不需要配偶的信息。
预期结果如下:
family id edu fedu
1 1 A 12 10
2 1 C 10 NA
3 1 E1 15 18
4 1 E2 13 18
5 1 G11 3 15
6 1 G12 1 15
7 2 A 21 16
8 2 C 12 NA
9 2 E1 18 21
10 2 E2 17 21
11 2 E3 16 21
数据如下:
d = structure(list(family = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2), relationship = c("A", "B", "C", "D", "E1", "F1", "G11", "G12", "E2", "A", "B", "C", "D", "E1", "F1", "E2", "E3"), meanings = c("respondent", "respondent's spouse", "A's father","A's mother", "A's first son", "E1's spouse", "E1's first son","E1's second son", "A's second son", "respondent", "respondent's spouse","A's father", "A's mother", "A's first son", "E1's spouse", "A's second son","A's third son"), edu = c(12, 18, 10, 9, 15, 14, 3, 1, 13, 21,16, 12, 16, 18, 15, 17, 16)), row.names = c(NA, -17L), class = c("tbl_df", "tbl", "data.frame"))
最佳答案
这是我尝试过的。我认为有必要创建一个世代变量。看到问题中的示例图像,C
和D
是第一代。 A
和B
是第二代。 E
和F
是第三代,而G
是第四代。第一个带有mutate()
的case_when()
创建了生成变量。然后,我通过family
和generation
定义了组。对于每个小组,我确定了最长的教育持续时间(即max_ed_duration
)。由于您说不需要配偶的信息,因此我在meanings
中删除了包含母亲或配偶的行。然后,我再次使用family
定义了组。对于每个家庭,如果generation
为1,则将NA分配给fedu
。否则,将上一代的max_ed_duration
值分配给fedu
。最后,我按family
和relationship
排列了数据。
library(dplyr)
mutate(mydf, generation = case_when(relationship %in% c("C", "D") ~ 1,
relationship %in% c("A", "B") ~ 2,
grepl(x = relationship, pattern = "^E|F") ~ 3,
grepl(x = relationship, pattern = "^G") ~ 4)) %>%
group_by(family, generation) %>%
mutate(max_ed_duration = max(edu)) %>%
filter(!grepl(x = meanings, pattern = "mother|spouse")) %>%
group_by(family) %>%
mutate(fedu = if_else(generation == 1,
NA_real_,
max_ed_duration[match(x = generation - 1, table = generation)])) %>%
arrange(family, relationship)
# family relationship meanings edu generation max_ed_duration fedu
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 A respondent 12 2 18 10
# 2 1 C A's father 10 1 10 NA
# 3 1 E1 A's first son 15 3 15 18
# 4 1 E2 A's second son 13 3 15 18
# 5 1 G11 E1's first son 3 4 3 15
# 6 1 G12 E1's second son 1 4 3 15
# 7 2 A respondent 21 2 21 16
# 8 2 C A's father 12 1 16 NA
# 9 2 E1 A's first son 18 3 18 21
#10 2 E2 A's second son 17 3 18 21
#11 2 E3 A's third son 16 3 18 21
数据
mydf <- structure(list(family = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2), relationship = c("A", "B", "C", "D", "E1", "F1",
"G11", "G12", "E2", "A", "B", "C", "D", "E1", "F1", "E2", "E3"
), meanings = c("respondent", "respondent's spouse", "A's father",
"A's mother", "A's first son", "E1's spouse", "E1's first son",
"E1's second son", "A's second son", "respondent", "respondent's spouse",
"A's father", "A's mother", "A's first son", "E1's spouse", "A's second son",
"A's third son"), edu = c(12, 18, 10, 9, 15, 14, 3, 1, 13, 21,
16, 12, 16, 18, 15, 17, 16)), class = "data.frame", row.names = c(NA,
-17L))
关于r - 如何计算一个家庭中老年人的最高受教育年限,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59466424/