本文介绍了TypeError: 'JavaPackage' 对象不可用于 PySpark 中的 Xgboost的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使 Scala Xgboost API 可用于我的 PySpark Notebook.并关注此博客:https://towardsdatascience.com/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb但是,继续遇到以下错误:

I am trying to make Scala Xgboost API available for my PySpark Notebook. And following this blog:https://towardsdatascience.com/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdbHowever, keep on running into below err:

spark._jvm.ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
<py4j.java_gateway.JavaPackage at 0x7fa650fe7a58>
from sparkxgb import XGBoostEstimator

xgboost = XGBoostEstimator(
    featuresCol="features",
    labelCol="Survival",
    predictionCol="prediction"
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-1765fb9e3344> in <module>
      4     featuresCol="features",
      5     labelCol="Survival",
----> 6     predictionCol="prediction"
      7 )

~/spark-assembly-2.4.0-twttr-kryo3-scala2128-hadoop2.9.2.t05/python/pyspark/__init__.py in wrapper(self, *args, **kwargs)
    108             raise TypeError("Method %s forces keyword arguments." % func.__name__)
    109         self._input_kwargs = kwargs
--> 110         return func(self, **kwargs)
    111     return wrapper
    112

~/local/spark-3536cd7a-6188-4ca8-b3d0-57d42cd01531/userFiles-0a0d90bc-96b4-43f2-bf21-00ae0e6f7309/sparkxgb.zip/sparkxgb/xgboost.py in __init__(self, checkpoint_path, checkpointInterval, missing, nthread, nworkers, silent, use_external_memory, baseMarginCol, featuresCol, labelCol, predictionCol, weightCol, base_score, booster, eval_metric, num_class, num_round, objective, seed, alpha, colsample_bytree, colsample_bylevel, eta, gamma, grow_policy, max_bin, max_delta_step, max_depth, min_child_weight, reg_lambda, scale_pos_weight, sketch_eps, subsample, tree_method, normalize_type, rate_drop, sample_type, skip_drop, lambda_bias)
    113
    114         super(XGBoostEstimator, self).__init__()
--> 115         self._java_obj = self._new_java_obj("ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator", self.uid)
    116         self._create_params_from_java()
    117         self._setDefault(

~/spark-assembly-2.4.0-twttr-kryo3-scala2128-hadoop2.9.2.t05/python/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args)
     65             java_obj = getattr(java_obj, name)
     66         java_args = [_py2java(sc, arg) for arg in args]
---> 67         return java_obj(*java_args)
     68
     69     @staticmethod

TypeError: 'JavaPackage' object is not callable

我已经谷歌了这个错误并尝试了以下内容.我从这个博客中得到了所有的想法 https://github.com/JohnSnowLabs/spark-nlp/issues/232 :

I already google this error and tried below things. I got all ideas from this blog https://github.com/JohnSnowLabs/spark-nlp/issues/232 :

  1. 确保 Xgboost4j 在 SPARK_DIST_CLASSPATH 中.已经检查过了.
    $echo $SPARK_DIST_CLASSPATH |  tr " " "\n" | grep 'xgboost4j' | rev | cut -d'/' -f1 | rev
    xgboost4j-0.72.jar
    xgboost4j-spark.72.jar
  1. 确保将它们添加到 EXTRA_CLASSPATH.- 完成
  2. 更新配置.
'export PYSPARK_SUBMIT_ARGS="--conf spark.jars=$SPARK_HOME/jars/* --conf spark.driver.extraClassPath=$SPARK_HOME/jars/* --conf spark.executor.extraClassPath=$SPARK_HOME/jars/* pyspark-shell"',

硬件信息:

  • 机器:Linux
  • 使用 Jupyter Notebook.
  • Spark 2.4.0 版
  • python3.6

推荐答案

我发现了这个问题,问题是 sparkxbg.zip(我从网上下载的)是为 编写的xgboost4j-0.72.但是,我的罐子来自 xgoost4j-0.9.并且 API 已经完全改变.因此 0.9 版本没有任何名为 ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator 的类.因此错误.您可以在下面的 API 中看到不同之处:

I found the problem, The problem was that the sparkxbg.zip(which I downloaded over internet) is written for xgboost4j-0.72. However, my jars were from xgoost4j-0.9. And the API has been completetly changed. As a result 0.9 version didn't had any class named ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator. And hence the error. You can see the difference in API below:

https://github.com/dmlc/xgboost/tree/release_0.72/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark

对比

https://github.com/dmlc/xgboost/tree/v0.90/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark

这篇关于TypeError: 'JavaPackage' 对象不可用于 PySpark 中的 Xgboost的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!