问题描述
我想使用pmml库导出Caret随机森林模型,以便可以将其用于Java中的预测. 这是我得到的错误的再现.
data(iris)
require(caret)
require(pmml)
rfGrid2 <- expand.grid(.mtry = c(1,2))
fitControl2 <- trainControl(
method = "repeatedcv",
number = NUMBER_OF_CV,
repeats = REPEATES)
model.Test <- train(Species ~ .,
data = iris,
method ="rf",
trControl = fitControl2,
ntree = NUMBER_OF_TREES,
importance = TRUE,
tuneGrid = rfGrid2)
print(model.Test)
pmml(model.Test)
Error in UseMethod("pmml") :
no applicable method for 'pmml' applied to an object of class "c('train', 'train.formula')"
我搜索了一段时间,发现实际上很少有关于导出到PMML的信息,pmml库在以下位置具有randomforest:
methods(pmml)
[1] pmml.ada pmml.coxph pmml.cv.glmnet pmml.glm pmml.hclust pmml.itemsets pmml.kmeans
[8] pmml.ksvm pmml.lm pmml.multinom pmml.naiveBayes pmml.nnet pmml.randomForest pmml.rfsrc
[15] pmml.rpart pmml.rules pmml.svm
它使用直接的随机森林模型工作,但没有经过插入符号训练的模型.
library(randomForest)
iris.rf <- randomForest(Species ~ ., data=iris, ntree=20)
# Convert to pmml
pmml(iris.rf)
# this works!!!
str(iris.rf)
List of 19
$ call : language randomForest(formula = Species ~ ., data = iris, ntree = 20)
$ type : chr "classification"
$ predicted : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
...
str(model.Test)
List of 22
$ method : chr "rf"
$ modelInfo :List of 14
..$ label : chr "Random Forest"
..$ library : chr "randomForest"
..$ loop : NULL
..$ type : chr [1:2] "Classification" "Regression"
...
您不能使用train
或train.formula
类型(即,这是model.Test
对象的类型)调用pmml
方法./p>
train
方法的维护文档说明,您可以作为finalModel
字段访问最佳模型.然后可以在该对象上调用pmml
方法.
rf = model.Test$finalModel
pmml(rf)
不幸的是,事实证明Caret使用矩阵接口"(即通过设置x
和y
字段)而不是使用更常见的公式接口"(即通过设置)来指定RF模型formula
字段). AFAIK的"pmml"软件包不支持此类RF模型的导出.
因此,看来最好的选择是使用两级方法.首先,使用Caret软件包为您的数据集找到最合适的RF参数化.其次,使用带有这种参数化的公式界面"手动训练最终的RF模型.
I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting.
data(iris)
require(caret)
require(pmml)
rfGrid2 <- expand.grid(.mtry = c(1,2))
fitControl2 <- trainControl(
method = "repeatedcv",
number = NUMBER_OF_CV,
repeats = REPEATES)
model.Test <- train(Species ~ .,
data = iris,
method ="rf",
trControl = fitControl2,
ntree = NUMBER_OF_TREES,
importance = TRUE,
tuneGrid = rfGrid2)
print(model.Test)
pmml(model.Test)
Error in UseMethod("pmml") :
no applicable method for 'pmml' applied to an object of class "c('train', 'train.formula')"
I was googling for a while, and found actually little info about exporting to PMML in general the pmml library has the randomforest in:
methods(pmml)
[1] pmml.ada pmml.coxph pmml.cv.glmnet pmml.glm pmml.hclust pmml.itemsets pmml.kmeans
[8] pmml.ksvm pmml.lm pmml.multinom pmml.naiveBayes pmml.nnet pmml.randomForest pmml.rfsrc
[15] pmml.rpart pmml.rules pmml.svm
It works using a direct randomforest model, but not the caret trained one.
library(randomForest)
iris.rf <- randomForest(Species ~ ., data=iris, ntree=20)
# Convert to pmml
pmml(iris.rf)
# this works!!!
str(iris.rf)
List of 19
$ call : language randomForest(formula = Species ~ ., data = iris, ntree = 20)
$ type : chr "classification"
$ predicted : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
...
str(model.Test)
List of 22
$ method : chr "rf"
$ modelInfo :List of 14
..$ label : chr "Random Forest"
..$ library : chr "randomForest"
..$ loop : NULL
..$ type : chr [1:2] "Classification" "Regression"
...
You cannot invoke the pmml
method with train
or train.formula
types (ie. this is the type of your model.Test
object).
Caret documentation for the train
method says that you can access the best model as the finalModel
field. You can invoke the pmml
method on that object then.
rf = model.Test$finalModel
pmml(rf)
Unfortunately, it turns out that Caret specifies the RF model using the "matrix interface" (ie. by setting the x
and y
fields), not using the more common "formula interface" (ie. by setting the formula
field). AFAIK, the "pmml" package does not support the export of such RF models.
So, looks like your best option is to use a two-level approach. First, use the Caret package to find the most appropriate RF parametrization for your dataset. Second, train the final RF model manually using the "formula interface" with this parametrization.
这篇关于插入符号将随机森林建模为PMML错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!