问题描述
我想使用apply()
将变量转换为因子:
I want to convert variables into factors using apply()
:
a <- data.frame(x1 = rnorm(100),
x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)
导致:
x1 x2 x3
"character" "character" "character"
我不明白为什么这会导致字符向量而不是因子向量.
I don't understand why this results in character vectors instead of factor vectors.
推荐答案
apply
将您的data.frame转换为字符矩阵.使用lapply
:
apply
converts your data.frame to a character matrix. Use lapply
:
lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
在第二条命令中,应用lapply
将结果转换为字符矩阵:
In second command apply converts result to character matrix, using lapply
:
a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
但是对于简单的查找,您可以使用str
:
But for simple lookout you could use str
:
str(a)
# 'data.frame': 100 obs. of 3 variables:
# $ x1: num -1.79 -1.091 1.307 1.142 -0.972 ...
# $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
# $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
根据评论的其他说明:
apply
要做的第一件事是将参数转换为矩阵.因此apply(a)
等同于apply(as.matrix(a))
.如您所见,str(as.matrix(a))
为您提供:
The first thing that apply
does is to convert an argument to a matrix. So apply(a)
is equivalent to apply(as.matrix(a))
. As you can see str(as.matrix(a))
gives you:
chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "x1" "x2" "x3"
没有更多因素了,因此class
所有列返回"character"
.lapply
适用于列,因此可以为您提供所需的内容(每列都执行class(a$column_name)
之类的操作).
There are no more factors, so class
return "character"
for all columns.lapply
works on columns so gives you what you want (it does something like class(a$column_name)
for each column).
您可以在apply
的帮助中看到为什么apply
和as.factor
不起作用:
You can see in help to apply
why apply
and as.factor
doesn't work :
为什么sapply
和as.factor
不起作用,您可以在sapply
的帮助中看到:
Why sapply
and as.factor
doesn't work you can see in help to sapply
:
您永远不会获得因子矩阵或data.frame.
You never get matrix of factors or data.frame.
简单,在评论中使用as.data.frame
:
a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame': 100 obs. of 3 variables:
$ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
$ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
$ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
但是,如果要用factor
替换选定的字符列,则有一个窍门:
But if you want to replace selected character columns with factor
there is a trick:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: chr "a" "b" "c" "d" ...
$ x2: chr "A" "B" "C" "D" ...
$ x3: chr "A" "B" "C" "D" ...
columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: chr "A" "B" "C" "D" ...
您可以使用它替换以下所有列:
You could use it to replace all columns using:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
这篇关于为什么在内部使用时as.factor返回一个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!