问题描述
我有一个数据框,其中一列可能包含以 |
分隔的串联字符:
I have a data frame where a column may contain concatenated characters separated by |
:
df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))
# df
# FOO
# 1 A|B|C
# 2 A|B
# 3 B|C
# 4 A
# 5 C
我想分割字符串并将各个值放在不同的列中
I want to split the string and put the individual values into different columns:
df
# X1 X2 X3
# 1 A B C
# 2 A B
# 3 B C
# 4 A
# 5 C
到目前为止,我尝试使用以下示例:[https://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame][1],但它不会在不重复的情况下拆分列值,我得到的是:
So far I tried with this example: [https://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame][1] but it is not splitting the columns without repeating values, what I get there is:
df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))
> df
X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C
我也收到此警告:
在这些情况下我该怎么办?最好使用 base
R.[1]:在数据框中的分隔符处拆分列
What can I do in those cases? Preferably with base
R.[1]: Split column at delimiter in data frame
推荐答案
只需这样做:
splt <- strsplit(as.character(df$FOO),"\\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x){all_val[!(all_val %in% x)]<-NA;all_val}))
# [,1] [,2] [,3]
#[1,] "A" "B" "C"
#[2,] "A" "B" NA
#[3,] NA "B" "C"
#[4,] "A" NA NA
#[5,] NA NA "C"
数据:
df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))
请注意:
我的版本是 base ::
(不需要库)和常规版本:
My version is base::
(no libraries needed) and general:
它还可以与:
df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))
这篇关于将串联的列拆分为相应的列位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!