如何在tidyverse中为每组另一个变量创建虚拟变量

如何在tidyverse中为每组另一个变量创建虚拟变量

本文介绍了如何在tidyverse中为每组另一个变量创建虚拟变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要创建(虚拟)变量,以显示观察是否属于一组观察(可通过公共 Group_ID 识别),并具有该组中的特定特征组合.代码示例更清楚地说明了我的意思.

I want create (dummy) variables that show whether an observation is in a group of observations (Identifiable by a common Group_ID) with a certain combination of characteristics across that group. The code example makes it clearer what I exactly mean.

我尝试了 group_by 和 caret::dummyVars 的组合,但没有成功.我的想法不多了 - 非常感谢您的帮助.

I tried combinations of group_by and caret::dummyVars, but had no success. I am running out of ideas - any help would be appreciated very much.

library(tidyverse)

# Input data
# please note: in my case each value of the column Role will appear only once per Group_ID.

input_data <- tribble( ~Group_ID, ~Role, ~Income,
                        #--|--|----
                        1, "a", 3.6,
                        1, "b", 8.5,

                        2, "a", 7.6,
                        2, "c", 9.5,
                        2, "d", 9.7,

                        3, "a", 1.6,
                        3, "b", 4.5,
                        3, "c", 2.7,
                        3, "e", 7.7,

                        4, "b", 3.3,
                        4, "c", 6.2,
)

# desired output
output_data <- tribble( ~Group_ID, ~Role, ~Income, ~Role_A,  ~Role_B, ~Role_C, ~Role_D, ~Role_E, ~All_roles,
                        #--|--|----
                        1, "a", 3.6, 1, 1, 0, 0, 0, "ab",
                        1, "b", 8.5, 1, 1, 0, 0, 0, "ab",

                        2, "a", 7.6, 1, 0, 1, 1, 0, "acd",
                        2, "c", 9.5, 1, 0, 1, 1, 0, "acd",
                        2, "d", 9.7, 1, 0, 1, 1, 0, "acd",

                        3, "a", 1.6, 1, 1, 1, 0, 1, "abce",
                        3, "b", 4.5, 1, 1, 1, 0, 1, "abce",
                        3, "c", 2.7, 1, 1, 1, 0, 1, "abce",
                        3, "e", 7.7, 1, 1, 1, 0, 1, "abce",

                        4, "b", 3.3, 0, 1, 1, 0, 0, "bc",
                        4, "c", 6.2, 0, 1, 1, 0, 0, "bc"
)

推荐答案

以下内容利用基本 R 建模函数来创建假人.

The following takes advantage of base R modeling functions to create the dummies.

首先,创建一个没有截距的模型矩阵.

First, create a model matrix with no intercept.

fit <- lm(Group_ID ~ 0 + Role, input_data)
m <- model.matrix(fit)

现在,通过注意问题要求的哑元是 Group_ID 组的总和来处理该矩阵.

Now, process that matrix by noting that the dummies the question asks for are the sums by groups of Group_ID.

input_data %>%
  bind_cols(m %>% as.data.frame()) %>%
  group_by(Group_ID) %>%
  mutate_at(vars(matches("Role[[:alpha:]]")), sum) %>%
  mutate(all_roles = paste(Role, collapse = ""))
## A tibble: 11 x 9
## Groups:   Group_ID [4]
#   Group_ID Role  Income Rolea Roleb Rolec Roled Rolee all_roles
#      <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
# 1        1 a        3.6     1     1     0     0     0 ab
# 2        1 b        8.5     1     1     0     0     0 ab
# 3        2 a        7.6     1     0     1     1     0 acd
# 4        2 c        9.5     1     0     1     1     0 acd
# 5        2 d        9.7     1     0     1     1     0 acd
# 6        3 a        1.6     1     1     1     0     1 abce
# 7        3 b        4.5     1     1     1     0     1 abce
# 8        3 c        2.7     1     1     1     0     1 abce
# 9        3 e        7.7     1     1     1     0     1 abce
#10        4 b        3.3     0     1     1     0     0 bc
#11        4 c        6.2     0     1     1     0     0 bc

这篇关于如何在tidyverse中为每组另一个变量创建虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 16:45