This question already has answers here:
Split data frame string column into multiple columns

(15个答案)


去年关闭。





我有一个具有以下结构的数据框,标题为“ final_proj_data”

ID          County              Population     Year
<dbl>       <chr>               <dbl>          <dbl>
1003    Baldwin County, Alabama 169162         2006
1015    Calhoun County, Alabama 112903         2006
1043    Cullman County, Alabama 80187          2006
1049    DeKalb County, Alabama  68014          2006


我试图将“县”列拆分为两个不同的列,“县”和“州”,并删除逗号。

我尝试了split()函数的许多排列,但我不断找回此错误:


错误:var必须计算为单个数字或列名,而不是
字符向量


我已经尝试过

  final_proj_data %>%
separate(final_proj_data$County, c("State", "County"), sep = ",", remove = TRUE)
    final_proj_data %>%
separate(data = final_proj_data, col = County,
 into = c("State", "County"), sep = ",")


我不确定自己在做什么错,也不确定为什么“ col =”总是抛出该错误。任何帮助,将不胜感激!

最佳答案

使用dplyr和基数R:

library(dplyr)
 final_proj_data %>%
 mutate(State=unlist(lapply(strsplit(County,", "),function(x) x[2])),
       County=gsub(",.*","",County))
    ID         County Population Year   State
1 1003 Baldwin County     169162 2006 Alabama
2 1015 Calhoun County     112903 2006 Alabama
3 1043 Cullman County      80187 2006 Alabama
4 1049  DeKalb County      68014 2006 Alabama


原版的:

使用dplyrtidyr(只看到@Ronak Shah在上面进行了评论):

library(dplyr)
library(tidyr)
final_proj_data %>%
   separate(County,c("County","State"),sep=",")
    ID         County    State Population Year
1 1003 Baldwin County  Alabama     169162 2006
2 1015 Calhoun County  Alabama     112903 2006
3 1043 Cullman County  Alabama      80187 2006
4 1049  DeKalb County  Alabama      68014 2006

07-25 23:48
查看更多