问题描述
我有一个像这样的数据框:
I have a data frame like this:
A B
2012,2013,2014 2011
2012,2013,2014 2012
2012,2013,2014 2013
2012,2013,2014 2014
2012,2013,2014 2015
我想创建一个虚拟变量,该变量指示B列中的值是否存在于A列中.1表示存在,0表示不存在.这样,
I wanted to create a dummy variable, which indicates whether the value in column B exists in column A. 1 indicates the existence, and 0 indicates non-existant. Such that,
A B dummy
2012,2013,2014 2011 0
2012,2013,2014 2012 1
2012,2013,2014 2013 1
2012,2013,2014 2014 1
2012,2013,2014 2015 0
我尝试使用%in%
来实现此目的:
I have tried to use %in%
to achieve this:
df$dummy <- ifelse(df$B %in% df$A, 1, 0)
但事实证明, dummy
列中的所有内容均为 1
.
but it turned out that everything in the column of dummy
is 1
.
当我尝试使用另一种方法 any()
时,发生了同样的情况:
Same situation happened when I tried to use another method any()
:
df$dummy <- any(df$A==df$B)
虚拟列中的所有内容均为 TRUE
.
有没有一种有效的方法来生成这个虚拟变量?
Is there an efficient way to generate this dummy variable?
非常感谢!
推荐答案
看起来 A
列是一串用逗号分隔的数字,因此%in%
不合适(例如,如果您在多个字符串的向量中检查了 B
,或者如果 A
和 B
是数字).如果您的数据框架结构不同,请告诉我(并随时编辑您的问题).
It looks like column A
is a string of numbers separated by commas, so %in%
would not be appropriate (it would be helpful if, for example, you checked for B
inside a vector of multiple strings, or numbers if A
and B
were numeric). If your data frame structure is different, please let me know (and feel free to edit your question).
您可能可以通过多种方式完成此操作.也许一种简单的方法是一次使用 grepl
行,以识别 A
中是否存在 B
列.
You probably could accomplish this multiple ways. Perhaps an easy way is to use grepl
one row at a time to identify if column B
is present in A
.
library(tidyverse)
df %>%
rowwise() %>%
mutate(dummy = +grepl(B, A))
输出
# A tibble: 5 x 3
A B dummy
<fct> <fct> <int>
1 2012,2013,2014 2011 0
2 2012,2013,2014 2012 1
3 2012,2013,2014 2013 1
4 2012,2013,2014 2014 1
5 2012,2013,2014 2015 0
数据
df <- data.frame(
A = c(rep("2012,2013,2014", 5)),
B = c("2011", "2012", "2013", "2014", "2015")
)
这篇关于R:根据一个列的值存在于另一列中,生成一个虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!