本文介绍了XGBoostError:参数 num_class 的值 0 应大于等于 1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试比较两种不同的功能集,以将客户分为高价值、中价值和低价值.这是我使用的代码:

I'm trying to compare two different feature sets for classifying customers into high-value, mid-value, and low-value. This is the code I used:

ltv_xgb_model = xgb.XGBClassifier(max_depth=5, learning_rate=0.1,objective='multi:softmax',n_jobs=-1).fit(X_train, y_train)        

第一个数据集在训练数据中有 11 个客户,在测试数据中有 2 个客户.尽管客户数量有限,但分类器能够为其中一个特征集实现 50% 的精度.

The first dataset has 11 customers in the training data, and 2 customers in the testing data. The classifier is able to achieve 50% precision for one of the feature sets, despite the limited number of customers.

第二个数据集在训练数据中有 14 个客户,在测试数据中有 2 个客户.虽然我们有更大的训练集,但分类器抛出了错误:

The second dataset has 14 customers in the training data, and 2 customers in the testing data. Although we have a bigger training set, the classifier threw an error:

XGBoostError:参数 num_class 的值 0 应大于等于 1

论坛上以前的帖子提到 .fit() 方法会自动设置 num_class 参数.请参阅此处:XGBClassifier num_class 无效.因此,问题似乎是由其他原因引起的.

Previous posts on the forum have mentioned that the .fit() method automatically sets the num_class parameter. See here: XGBClassifier num_class is invalid. Therefore, the problem seems to be caused by something else.

有人知道问题出在哪里吗?任何帮助表示赞赏!

Does anybody has any idea where the problem is? Any help is appreciated!

推荐答案

原因是因为 XGBoost 会根据您提供的训练数据推导出类的数量.对于 multi:softmax 最小类数应该是 3(如果你有 2 个类,你应该使用二元分类目标).因此,这里的问题很可能是您的数据集中只有 2 个或更少的唯一值作为目标.

The reason is because XGBoost is deducing number of classes based on the training data you give it. And for multi:softmax minimum number of classes should be 3 (if you have 2 classes you should use binary classification objective). So most likely the problem here is that in your dataset you only have 2 or less unique values as targets.

通常,数据集的 11 和 14 个元素非常小.我强烈建议不要在这种规模上训练 ML 模型.如果你真的想用很少的训练样本来检查你的模型有多好,你应该进行完全的留一法交叉验证(即以相同的方式训练一个模型,而不只是一个例子,并在那个例子上测试结果模型).如果结果对您来说看起来不错(但他们很可能不会) - 那么您可以在完整数据集上训练模型并使用该模型.

In general 11 and 14 elements for datasets is very small. I would strongly recommend against training ML models on such scale. If you want to really check how good is your model with very little number of training samples you should do full leave-one-out cross-validation (i.e. train a model same way without just one example and test the resulting model on that example). If the results are looking good for you (they most likely will not though) - then you can train a model on full dataset and use that model.

这篇关于XGBoostError:参数 num_class 的值 0 应大于等于 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-09 23:48