我有一个数据框,我使用 discretized
RWeka
。 RWeka 的离散化创建带有单引号的 bin。虽然它们没有引起任何问题,但在绘制时,使用 'All'
类别的变量看起来很难看。
这是离散化的数据框:
structure(list(outlook = structure(c(1L, 1L, 2L, 3L, 3L, 3L,
2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L), .Label = c("sunny", "overcast",
"rainy"), class = "factor"), temperature = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
humidity = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "'All'", class = "factor"),
windy = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE), play = structure(c(2L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("yes",
"no"), class = "factor")), .Names = c("outlook", "temperature",
"humidity", "windy", "play"), row.names = c(NA, -14L), class = "data.frame")
如何从数据中删除单引号并重新创建因子?
最佳答案
这应该这样做:
df$temperature <- gsub("\\'", "", df$temperature)
df$humidity <- gsub("\\'", "", df$humidity)
> df
outlook temperature humidity windy play
1 sunny All All FALSE no
2 sunny All All TRUE no
3 overcast All All FALSE yes
4 rainy All All FALSE yes
5 rainy All All FALSE yes
6 rainy All All TRUE no
7 overcast All All TRUE yes
8 sunny All All FALSE no
9 sunny All All FALSE yes
10 rainy All All FALSE yes
11 sunny All All TRUE yes
12 overcast All All TRUE yes
13 overcast All All FALSE yes
14 rainy All All TRUE no
如果您需要对多个列执行相同操作,这可能会更有效。
df[, 2:3] <- apply(df[, 2:3], 2, function(x) {
gsub("\\'", "", x)
})
关于从数据框的因子中删除引号,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/12921730/