问题描述
我正在尝试确保在树对象和预测测试集中都完整地表示了我所有类型因子的特征(就所有可能的因子水平而言).
I am trying to make sure that all my features of type factors are represented fully (in terms of all possible factor levels) both in my tree object and in my test set for prediction.
for (j in 1:length(predictors)){
if (is.factor(Test[,j])){
ct [[names(predictors)[j]]] <- union(ct$xlevels[[names(predictors)[j]]], levels(Test[,c(names(predictors)[j])]))
}
}
但是,对于对象ct(来自打包方的树),由于出现错误,我似乎无法理解如何访问功能的因子水平
however, for object ct (ctree from package party) I can't seem to understand how to access the features' factor levels, as I am getting an error
Error in ct$xlevels : $ operator not defined for this S4 class
推荐答案
我无数次遇到了这个问题,今天我想到了一个小技巧,该小技巧应该不需要修复各个级别的因素差异.
I had this problem countless times and today I come up with a little hack that should make not needed to fix levels' discrepancy in factors.
只需在整个数据集(火车+测试)上建立模型,为测试观测值赋予零权重即可.这样,ctree模型将不会降低因子水平.
Just make the model on the whole dataset (train + test) giving zero weight to test observations. This way the ctree model will not drop factor levels.
a <- ctree(Y ~ ., DF[train.IDs,]) %>% predict(newdata = DF) # Would trigger error if the data passed to predict would not match the train data levels
b <- ctree(Y ~ ., weights = as.numeric((1:nrow(DF) %in% train.IDs)), data = DF) %>% predict(newdata = DF) # passing the IDs as 0-1 in the weights instead of subsetting the data solves it
mean(a == b) # test that predictions are equals, should be 1
告诉我它是否按预期工作!
Tell me if it works as expected!
这篇关于R更新ctree(打包方)功能因素级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!