问题描述
我有一个具有以下结构的数据框
I have a dataframe with the following structure
test <- data.frame(col = c('a; ff; cc; rr;', 'rr; a; cc; e;'))
现在我想从中创建一个数据框,其中包含测试数据框中每个唯一值的命名列.唯一值是以;"结尾的值字符并以空格开头,不包括空格.然后对于列中的每一行,我希望用 1 或 0 填充虚拟列.如下所示
Now I want to create a dataframe from this which contains a named column for each of the unique values in the test dataframe. A unique value is a value ended by the ';' character and starting with a space, not including the space. Then for each of the rows in the column I wish to fill the dummy columns with either a 1 or a 0. As given below
data.frame(a = c(1,1), ff = c(1,0), cc = c(1,1), rr = c(1,0), e = c(0,1))
a ff cc rr e
1 1 1 1 1 0
2 1 0 1 1 1
我尝试使用 for 循环和列中的唯一值创建 df,但它变得很混乱.我有一个包含列的唯一值的可用向量.问题是如何创建 1 和 0.我用 grep()
尝试了一些 mutate_all()
函数,但这不起作用.
I tried creating a df using for loops and the unique values in the column but it's getting to messy. I have a vector available containing the unique values of the column. The problem is how to create the ones and zeros. I tried some mutate_all()
function with grep()
but this did not work.
推荐答案
我会使用 qdapTools
包中的 splitstackshape
和 mtabulate
来获得这是一个单衬,即
I'd use splitstackshape
and mtabulate
from qdapTools
packages to get this as a one liner,i.e.
library(splitstackshape)
library(qdapTools)
mtabulate(as.data.frame(t(cSplit(test, 'col', sep = ';', 'wide'))))
# a cc ff rr e
#V1 1 1 1 1 0
#V2 1 1 0 1 1
它也可以是完整的splitstackshape
,正如@A5C1D2H2I1M1N2O1R2T1 在评论中提到的那样,
It can also be full splitstackshape
as @A5C1D2H2I1M1N2O1R2T1 mentions in comments,
cSplit_e(test, "col", ";", mode = "binary", type = "character", fill = 0)
这篇关于Dummify 字符列并找到唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!