问题描述
我观察到scikit-learn clf.tree_.feature偶尔会返回负值.例如-2.据我了解,clf.tree_.feature应该返回功能的顺序.如果我们有特征名称数组 ['feature_one','feature_two','feature_three']
,则-2表示 feature_two
.我对负索引的使用感到惊讶.用索引1引用 feature_two
会更有意义.(-2是便于人类消化的引用,不适用于机器处理).我读得对吗?
I observed that scikit-learn clf.tree_.feature occasional return negative values. For example -2. As far as I understand clf.tree_.feature is supposed to return sequential order of the features. In case we have array of feature names ['feature_one', 'feature_two', 'feature_three']
, then -2 would refer to feature_two
. I am surprised with usage of negative index. In would make more sense to refer to feature_two
by index 1. (-2 is reference convenient for human digestion, not for machine processing). Am I reading it correctly?
更新:这是一个示例:
def leaf_ordering():
X = np.genfromtxt('X.csv', delimiter=',')
Y = np.genfromtxt('Y.csv',delimiter=',')
dt = DecisionTreeClassifier(min_samples_leaf=10, random_state=99)
dt.fit(X, Y)
print(dt.tree_.feature)
以下是输出:
[ 8 9 -2 -2 9 4 -2 9 8 -2 -2 0 0 9 9 8 -2 -2 9 -2 -2 6 -2 -2 -2
2 -2 9 8 6 9 -2 -2 -2 8 9 -2 9 6 -2 -2 -2 6 -2 -2 9 -2 6 -2 -2
2 -2 -2]
推荐答案
通过阅读树生成器的Cython源代码,我们看到-2只是叶节点的特征分割属性的伪值.
By reading the Cython source code for the tree generator we see that the -2's are just dummy values for the leaf nodes's feature split attribute.
TREE_UNDEFINED = -2
if is_leaf:
# Node is not expandable; set node as leaf
node.left_child = _TREE_LEAF
node.right_child = _TREE_LEAF
node.feature = _TREE_UNDEFINED
node.threshold = _TREE_UNDEFINED
这篇关于clf.tree_.feature的输出是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!