在生成的决策树回归模型中,当使用graphviz查看树结构时,具有MSE属性。我需要获取每个叶节点的MSE,并根据MSE进行后续操作。但是,阅读文档后,我找不到提供输出MSE的方法。其他属性,例如特征名称,样本编号,预测值等。所有属性都有对应的方法:

python - 如何在scikit-learn的DecisionTreeRegressor中获取节点的MSE?-LMLPHP

使用help(sklearn.tree._tree.Tree),我可以看到大多数属性都有一些输出值的方法,但是我看不到有关MSE的任何信息。

模块sklearn.tree._tree中有关类Tree的帮助
python - 如何在scikit-learn的DecisionTreeRegressor中获取节点的MSE?-LMLPHP

最佳答案

好问题。您需要tree_reg.tree_.impurity

简短答案:

tree_reg = tree.DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X_train, y_train)

extracted_MSEs = tree_reg.tree_.impurity # The Hidden magic is HERE

for idx, MSE in enumerate(tree_reg.tree_.impurity):
    print("Node {} has MSE {}".format(idx,MSE))

Node 0 has MSE 86.873403833
Node 1 has MSE 40.3211827171
Node 2 has MSE 25.6934820064
Node 3 has MSE 19.0053469592
Node 4 has MSE 74.6839429717
Node 5 has MSE 38.3057346817
Node 6 has MSE 39.6709615385


使用带有可视输出的boston数据集的长答案:
import pandas as pd
import numpy as np
from sklearn import ensemble, model_selection, metrics, datasets, tree
import graphviz

house_prices = datasets.load_boston()

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    pd.DataFrame(house_prices.data, columns=house_prices.feature_names),
    pd.Series(house_prices.target, name="med_price"),
    test_size=0.20, random_state=42)

tree_reg = tree.DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X_train, y_train)

extracted_MSEs = tree_reg.tree_.impurity # YOU NEED THIS
print(extracted_MSEs)
#[86.87340383 40.32118272 25.69348201 19.00534696 74.68394297 38.30573468 39.67096154]

# Compare visually
dot_data = tree.export_graphviz(tree_reg, out_file=None, feature_names=X_train.columns)
graph = graphviz.Source(dot_data)

#this will create an boston.pdf file with the rule path
graph.render("boston")

将MSE值与可视化输出进行比较:

python - 如何在scikit-learn的DecisionTreeRegressor中获取节点的MSE?-LMLPHP

关于python - 如何在scikit-learn的DecisionTreeRegressor中获取节点的MSE?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59375220/

10-12 19:59