问题描述
我正在处理一个包含某些列中缺失值的数据集.我正在尝试将 Scikit-Learn 包装器接口的 XGBRegressor 用于 XGBoost.在那里,它提供了一个名为missing"的参数,您可以在其中输入浮点值,否则它将采用 Python 的 NaN 作为默认值.所以我需要帮助,比如如何使用此参数来填充数据集中列的缺失值.如果有人能提供一个简单的例子也很有帮助.
I am working on a dataset which contains missing values in certain columns. I am trying to use XGBRegressor of Scikit-Learn wrapper interface for XGBoost. There it provides a parameter called 'missing' in which you can enter float values or otherwise it takes NaN of python as default. So i need help like how can i use this parameter to fill missing values of the columns in my dataset. It will be helpful if one can provide me a simple example as well.
推荐答案
缺失值参数与您为 'missing' 参数提供的任何值一样工作,它会将其视为缺失值.例如,如果您提供 0.5 作为缺失值,那么它在您的数据中找到 0.5 的任何地方都将其视为缺失值.默认值为 NaN.所以 XGBoost 所做的是基于它定义路径之一作为默认路径的数据.例如,根据一个参数说它可以向左或向右两个方向移动,因此将根据数据将其中一个设置为默认值.因此,每当缺失值之一作为参数的输入出现时,假设您将 0.5 定义为缺失值,那么每当 0.5 出现在数据中时,它就会采用默认路径.最初我认为它会归因于缺失值,但事实并非如此.它只是将路径之一定义为默认路径,并且每当出现任何缺失值时,它都会采用该默认路径.这在论文 XGBoost: A Scalable Tree Boosting System
The missing value parameter works as whatever value you provide for 'missing' parameter it treats it as missing value. For example if you provide 0.5 as missing value, then wherever it finds 0.5 in your data it treats it as missing value. Default is NaN. So what XGBoost does is based on the data it defines one of the path as default path. For example based on one parameter say it can go in two directions either left or right, so one of that will be made default based on the data. So whenever one of the missing value comes as input for a parameter, say you defined 0.5 as missing, then whenever 0.5 comes in the data it takes the default path. Initially I thought it imputes the missing value but it does not. It just defines one of the path as default and whenever any missing value come it takes that default path. This is defined in the paper XGBoost: A Scalable Tree Boosting System
这篇关于如何使用scikit-learn的XGBRegressor的缺失参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!