生成一个虚拟变量

生成一个虚拟变量

本文介绍了R:根据一个列的值存在于另一列中,生成一个虚拟变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框:

I have a data frame like this:

A                    B
2012,2013,2014     2011
2012,2013,2014     2012
2012,2013,2014     2013
2012,2013,2014     2014
2012,2013,2014     2015

我想创建一个虚拟变量,该变量指示B列中的值是否存在于A列中.1表示存在,0表示不存在.这样,

I wanted to create a dummy variable, which indicates whether the value in column B exists in column A. 1 indicates the existence, and 0 indicates non-existant. Such that,

A                    B       dummy
2012,2013,2014     2011        0
2012,2013,2014     2012        1
2012,2013,2014     2013        1
2012,2013,2014     2014        1
2012,2013,2014     2015        0

我尝试使用%in%来实现此目的:

I have tried to use %in% to achieve this:

df$dummy <- ifelse(df$B %in% df$A, 1, 0)

但事实证明, dummy 列中的所有内容均为 1 .

but it turned out that everything in the column of dummy is 1.

当我尝试使用另一种方法 any()时,发生了同样的情况:

Same situation happened when I tried to use another method any():

df$dummy <- any(df$A==df$B)

虚拟列中的所有内容均为 TRUE .

有没有一种有效的方法来生成这个虚拟变量?

Is there an efficient way to generate this dummy variable?

非常感谢!

推荐答案

看起来 A 列是一串用逗号分隔的数字,因此%in%不合适(例如,如果您在多个字符串的向量中检查了 B ,或者如果 A B 是数字).如果您的数据框架结构不同,请告诉我(并随时编辑您的问题).

It looks like column A is a string of numbers separated by commas, so %in% would not be appropriate (it would be helpful if, for example, you checked for B inside a vector of multiple strings, or numbers if A and B were numeric). If your data frame structure is different, please let me know (and feel free to edit your question).

您可能可以通过多种方式完成此操作.也许一种简单的方法是一次使用 grepl 行,以识别 A 中是否存在 B 列.

You probably could accomplish this multiple ways. Perhaps an easy way is to use grepl one row at a time to identify if column B is present in A.

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(dummy = +grepl(B, A))

输出

# A tibble: 5 x 3
  A              B     dummy
  <fct>          <fct> <int>
1 2012,2013,2014 2011      0
2 2012,2013,2014 2012      1
3 2012,2013,2014 2013      1
4 2012,2013,2014 2014      1
5 2012,2013,2014 2015      0

数据

df <- data.frame(
  A = c(rep("2012,2013,2014", 5)),
  B = c("2011", "2012", "2013", "2014", "2015")
)

这篇关于R:根据一个列的值存在于另一列中,生成一个虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 17:17