问题描述
它与
我正在尝试通过为其提供参数 nthread = 16来优化XGBoost执行,其中我的系统具有24个内核.但是,当我训练模型时,在模型训练期间的任何时间点似乎都无法跨越约20%的CPU使用率.代码段如下:-
I am trying to optimize XGBoost execution by giving it parameter nthread= 16 where my system has 24 cores. But when I train my model, it doesn't seem to even cross approx 20% of CPU utilization at any point in time while model training.Code snippet is as follows:-
param_30 <- list("objective" = "reg:linear", # linear
"subsample"= subsample_30,
"colsample_bytree" = colsample_bytree_30,
"max_depth" = max_depth_30, # maximum depth of tree
"min_child_weight" = min_child_weight_30,
"max_delta_step" = max_delta_step_30,
"eta" = eta_30, # step size shrinkage
"gamma" = gamma_30, # minimum loss reduction
"nthread" = nthreads_30, # number of threads to be used
"scale_pos_weight" = 1.0
)
model <- xgboost(data = training.matrix[,-5],
label = training.matrix[,5],
verbose = 1, nrounds=nrounds_30, params = param_30,
maximize = FALSE, early_stopping_rounds = searchGrid$early_stopping_rounds_30[x])
请(如果可能的话,在 )向我解释如何提高CPU利用率并加快模型训练以提高执行效率. R中的代码将有助于进一步理解.
Please explain me (if possible) on how I can increase CPU utilization and speed up the model training for efficient execution. Code in R shall be helpful for further understanding.
假设:-关于XGBoost在R包中的执行
推荐答案
这是一个猜测...但是我遇到了这种情况...
This is a guess... but I have had this happen to me ...
您在并行过程中花费了大量时间通信,并且从未受到CPU的束缚. https://en.wikipedia.org/wiki/CPU绑定
You are spending to much time communicating during the parallelism and are not ever getting CPU bound. https://en.wikipedia.org/wiki/CPU-bound
最底线是您的数据不够大(行和列),和/或您的树不够深max_depth
不能保证有那么多核心.过多的开销. xgboost
并行执行拆分评估,因此大数据上的深树可以使CPU保持最大嗡嗡声.
Bottom line is your data isn't large enough (rows and columns ), and/or your trees aren't deep enough max_depth
to warrant that many cores. Too much overhead. xgboost
parallelizes split evaluations so deep trees on big data can keep the CPU humming at max.
我训练了许多模型,其中单线程的性能优于8/16核.切换时间过多,工作量不足.
I have trained many models where single threaded outperforms 8/16 cores. Too much time switching and not enough work.
**更多数据,更深的树木或更少的毛发:) **
**MORE DATA, DEEPER TREES OR LESS CORES :) **
这篇关于XGBoost机器学习技术中的并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!