问题描述
最近,我正在做多个实验来比较Python XgBoost和LightGBM。似乎这个LightGBM是一种新算法,人们说它在速度和准确性上都比XGBoost更好。
Recently, I am doing multiple experiments to compare Python XgBoost and LightGBM. It seems that this LightGBM is a new algorithm that people say it works better than XGBoost in both speed and accuracy.
这是。
这是,在这里您将查找可以调用的python函数。可以从LightGBM模型直接调用它,也可以由LightGBM scikit-learn调用。
This is LightGBM GitHub.This is LightGBM python API documents, here you will find python functions you can call. It can be directly called from LightGBM model and also can be called by LightGBM scikit-learn.
这是。如您所见,它的数据结构与上面的LightGBM python API非常相似。
This is the XGBoost Python API I use. As you can see, it has very similar data structure as LightGBM python API above.
这是我尝试过的内容:
- 如果在XGBoost和LightGBM中都使用
train()
方法,是的lightGBM可以更快地工作并且具有更高的准确性。但是此方法没有交叉验证。 - 如果您在两种算法中都尝试使用
cv()
方法,则该方法适用于交叉验证验证。但是,我找不到使用它的方法来返回一组最佳参数的方法。 - 如果您尝试scikit-learn
GridSearchCV()
使用LGBMClassifier和XGBClassifer。它适用于XGBClassifer,但适用于LGBClassifier,它将永远运行。
- If you use
train()
method in both XGBoost and LightGBM, yes lightGBM works faster and has higher accuracy. But this method, doesn't have cross validation. - If you try
cv()
method in both algorithms, it is for cross validation. However, I didn't find a way to use it return a set of optimum parameters. - if you try scikit-learn
GridSearchCV()
with LGBMClassifier and XGBClassifer. It works for XGBClassifer, but for LGBClassifier, it is running forever.
这是我使用时的代码示例带有两个分类器的GridSearchCV()
:
带有GridSearchCV的XGBClassifier
param_set = {
'n_estimators':[50, 100, 500, 1000]
}
gsearch = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1,
n_estimators=100, max_depth=5,
min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8,
nthread=7,
objective= 'binary:logistic', scale_pos_weight=1, seed=410),
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)
xgb_model2 = gsearch.fit(features_train, label_train)
xgb_model2.grid_scores_, xgb_model2.best_params_, xgb_model2.best_score_
这对于XGBoost效果很好,仅需几秒钟即可。
This works very well for XGBoost, and only tool a few seconds.
带有GridSearchCV的LightGBM
param_set = {
'n_estimators':[20, 50]
}
gsearch = GridSearchCV(estimator = LGBMClassifier( boosting_type='gbdt', num_leaves=30, max_depth=5, learning_rate=0.1, n_estimators=50, max_bin=225,
subsample_for_bin=0.8, objective=None, min_split_gain=0,
min_child_weight=5,
min_child_samples=10, subsample=1, subsample_freq=1,
colsample_bytree=1,
reg_alpha=1, reg_lambda=0, seed=410, nthread=7, silent=True),
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)
lgb_model2 = gsearch.fit(features_train, label_train)
lgb_model2.grid_scores_, lgb_model2.best_params_, lgb_model2.best_score_
但是,通过对LightGBM使用此方法
However, by using this method for LightGBM, it has been running the whole morning today still nothing generated.
我使用的是同一数据集,一个数据集包含30000条记录。
I am using the same dataset, a dataset contains 30000 records.
我有2个问题:
- 如果仅使用
cv()
方法,是否有必要调整最佳参数集? - 您知道为什么吗
GridSearchCV()
与LightGBM不能很好地配合吗?我想知道这是否仅发生在我身上,发生在别人身上的所有事情?
- If we just use
cv()
method, is there anyway to tune optimum set of parameters? - Do you know why
GridSearchCV()
does not work well with LightGBM? I'm wondering whether this only happens on me all it happened on others to?
推荐答案
尝试使用 n_jobs = 1
看看是否可行。
Try to use n_jobs = 1
and see if it works.
通常,如果使用 n_jobs = -1
或 n_jobs> 1
,那么如果__name __ =='__ main __'::
In general, if you use n_jobs = -1
or n_jobs > 1
then you should protect your script by using if __name__=='__main__':
:
简单示例:
import ...
if __name__=='__main__':
data= pd.read_csv('Prior Decompo2.csv', header=None)
X, y = data.iloc[0:, 0:26].values, data.iloc[0:,26].values
param_grid = {'C' : [0.01, 0.1, 1, 10], 'kernel': ('rbf', 'linear')}
classifier = SVC()
grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, scoring='accuracy', n_jobs=-1, verbose=42)
grid_search.fit(X,y)
最后,您能否尝试使用 n_jobs =-运行代码1
并包括如果__name __ =='__ main __':
正如我所解释的,看是否可行?
Finally, can you try to run your code using n_jobs = -1
and including if __name__=='__main__':
as I explained and see if it works?
这篇关于Python-带有GridSearchCV的LightGBM永远运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!