我有一个数据框,我使用 discretized RWeka 。 RWeka 的离散化创建带有单引号的 bin。虽然它们没有引起任何问题,但在绘制时,使用 'All' 类别的变量看起来很难看。

这是离散化的数据框:

structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L,
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast",
"rainy"), class = "factor"), temperature = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes",
"no"), class = "factor")), .Names = c("outlook", "temperature",
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")

如何从数据中删除单引号并重新创建因子?

最佳答案

这应该这样做:

df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
    outlook temperature humidity windy play
1     sunny         All      All FALSE   no
2     sunny         All      All  TRUE   no
3  overcast         All      All FALSE  yes
4     rainy         All      All FALSE  yes
5     rainy         All      All FALSE  yes
6     rainy         All      All  TRUE   no
7  overcast         All      All  TRUE  yes
8     sunny         All      All FALSE   no
9     sunny         All      All FALSE  yes
10    rainy         All      All FALSE  yes
11    sunny         All      All  TRUE  yes
12 overcast         All      All  TRUE  yes
13 overcast         All      All FALSE  yes
14    rainy         All      All  TRUE   no

如果您需要对多个列执行相同操作,这可能会更有效。
df[, 2:3] <- apply(df[, 2:3], 2, function(x) {
    gsub("\\'", "", x)
    })

关于从数据框的因子中删除引号,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/12921730/

10-12 16:50
查看更多