我经常使用充满特殊字符的表格(例如á,ľ,š,č,ť,ž,ý,á,í,é等)。
我发现了一个非常有用的函数,称为mgsub,它可以同时进行多个字符串替换。
我的向量效果很好,但是我正在努力将哪个函数应用于整个数据帧。

函数mgsub的工作方式如下:

library(mgsub)
mgsub::mgsub("...A čo i tam dušu dáš v tom boji divokom: Mor ty len, a voľ nebyť, ako byť otrokom.",
             pattern = c(".","A","č","š","á",":",",","ľ","ť","M"," "),
         replacement = c("","a","c","s","a","","","","t","m",""), fixed = TRUE)
[1] "acoitamdusudasvtombojidivokommortylenavonebytakobytotrokom"


但是如何将此功能应用于整个data.frame?例如在此data.frame ...

my.df <- data.frame(v1 = c("...A čo i tam dušu","dáš v tom boji"),
                    v2 = c("divokom:","Mor ty len,"),
                    v3 = c("a voľ nebyť,","ako byť otrokom."))

                  v1          v2               v3
1 ...A čo i tam dušu    divokom:     a voľ nebyť,
2     dáš v tom boji Mor ty len, ako byť otrokom.


我试着使自己快乐。但是它只给出错误...

data.frame(lapply(my.df, mgsub::mgsub,
                  pattern = c(".","A","č","š","á",":",",","ľ","ť","M"," "),
                  replacement = c("","a","c","s","a","","","","t","m",""), fixed = TRUE))
Error in nchar(string) : 'nchar()' requires a character vector


欢迎任何建议。

最佳答案

问题在于列是factor,而mgsub需要输入character。根据?mgsub


字符串-寻求替换的字符向量




将所有列转换为character

my.df[] <- lapply(my.df, as.character)


或使用type.convert

my.df <- type.convert(my.df, as.is = TRUE)


或在创建stringsAsFactors = FALSE时使用data.frame,因为data.frame中的默认选项是stringsAsFactors = TRUE

my.df <- data.frame(v1 = c("...A čo i tam dušu","dáš v tom boji"),
                    v2 = c("divokom:","Mor ty len,"),
                    v3 = c("a voľ nebyť,","ako byť otrokom."),
         stringsAsFactors = FALSE)

08-17 12:02