本文介绍了将串联的列拆分为相应的列位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中一列可能包含以 | 分隔的串联字符:

I have a data frame where a column may contain concatenated characters separated by |:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

# df
#     FOO
# 1 A|B|C
# 2   A|B
# 3   B|C
# 4     A
# 5     C

我想分割字符串并将各个值放在不同的列中

I want to split the string and put the individual values into different columns:

df
#  X1 X2 X3
# 1 A  B  C
# 2 A  B
# 3    B  C
# 4 A
# 5       C

到目前为止,我尝试使用以下示例:[https://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame][1],但它不会在不重复的情况下拆分列值,我得到的是:

So far I tried with this example: [https://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame][1] but it is not splitting the columns without repeating values, what I get there is:

df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
  X1 X2 X3
1  A  B  C
2  A  B  A
3  B  C  B
4  A  A  A
5  C  C  C

我也收到此警告:

在这些情况下我该怎么办?最好使用 base R.[1]:在数据框中的分隔符处拆分列

What can I do in those cases? Preferably with base R.[1]: Split column at delimiter in data frame

推荐答案

只需这样做:

splt <- strsplit(as.character(df$FOO),"\\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x){all_val[!(all_val %in% x)]<-NA;all_val}))


#     [,1] [,2] [,3]
#[1,] "A"  "B"  "C" 
#[2,] "A"  "B"  NA  
#[3,] NA   "B"  "C" 
#[4,] "A"  NA   NA  
#[5,] NA   NA   "C" 

数据:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

请注意:

我的版本是 base :: (不需要库)和常规版本:

My version is base:: (no libraries needed) and general:

它还可以与:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))

这篇关于将串联的列拆分为相应的列位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 22:07