问题描述
我们可以从sklearn文档的此处看到,或者从在我的实验中,DecisionTreeClassifier的所有树结构都是二叉树.条件是基尼或熵,每个DecisionTreeClassifier节点只能有0或1或2个子节点.
As we can see from the sklearn document here, or from my experiment, all the tree structure of DecisionTreeClassifier is binary tree. Either the criterion is gini or entropy, each DecisionTreeClassifier node can only has 0 or 1 or 2 child node.
但是从决策树介绍中幻灯片(第3页),理论决策树的每个节点可以有两个以上的子节点.
But from the decision tree introduction slide(page 3), each node of theoretic decision tree can has more than 2 child node.
所以我的问题是,为什么sklearn DecisionTreeClassifier的决策树结构只是二叉树(每个DecisionTreeClassifier节点只能有1个或2个子节点.)?我们可以为DecisionTreeClassifier获取具有2个以上子节点的树结构吗?
So my question is why the decision tree structure is only binary tree (each DecisionTreeClassifier node can only has 1 or 2 child node.) for sklearn DecisionTreeClassifier? Can we get the tree structure with more than 2 child node for DecisionTreeClassifier?
推荐答案
这是因为sklearn的方法是使用数字功能而不是分类(当您具有数字时)功能,要建立一个可以有任意数量的阈值(要产生两个以上子代的阈值)的拆分规则相对困难.另一方面,对于分类特征(在提供的幻灯片中使用),另一个可能的选择是具有尽可能多的子级值.两种方法都有其自身的问题(当您拥有大量可能的值时,分类方法使其几乎无法跟踪),而数值方法则需要特殊的特征编码(分类法很热门,这有效地意味着您仍然可以表示同一棵树,但不必使用物种"(带有3个孩子[狗,猫,人]),您将拥有更深的决策树:[狗,不是狗],[不是狗,而是猫,不是狗,不是猫,而是人]).
It is because sklearn's approach is to work with numerical features, not categorical, when you have numerical feature, it is relatively hard to build a nice splitting rule which can have arbitrary number of thresholds (which is required to produce more than 2 children). For categorical features, on the other hand (used in the slides provided), another possible option is to have as many children as possible values. Both approach have its own problems (categorical approach makes it nearly intracktable when you have plenty of possible values) and numerical requires particular features encoding (one hot for categorical, which efficiently means that you can still express the same tree, but instead of having "species" with 3 children [dog, cat, human] you will have deeper tree with decisions: [dog, not dog], [not dog but cat, not dog, not cat but human]).
因此,简短的答案是否,使用此实现您不能实现2个以上的子代,但这通常并不是真正的限制.
So the short answer is no, you cannot achieve more than 2 children with this implementation, however this is not something truly limiting in general.
这篇关于为什么决策树结构只是sklearn DecisionTreeClassifier的二叉树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!