本文介绍了在决策树中拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果到树的任何节点的输入是所显示的数据,则最佳分割是什么?任何拆分的孩子的准确度都比父母的准确度低,对吗?因此,即使准确性降低了,我们也会继续进行拆分吗?

If the input to any node of tree is the shown data, what will be the best split? Any split will have lesser children's accuracy than parent's accuracy, right?So even accuracy is decreasing will we go on splitting?

推荐答案

不获取特定数据,这很难回答

Without getting the specific data, this is hard to answer

但是模拟相似的数据可以给出一个大概的想法.这是此类数据的树,其中max_depth为3

But simulating a similiar data can give the rough idea. Here's a tree for such data with max_depth of 3

第一个拆分将右边的所有白点分类,并对其进行分类.

The first split takes all the white dots on the right, and classifies them.

第二个拆分将所有白点移到左侧,并将其分类.

The second split takes all the white dots to the left, and classifies the,.

第三次拆分尝试通过在y(X[1])轴上拆分来在中间的黑点和白点之间进行拆分

The third splits tries to split between the black points and the white points in the middle, by spliting across the y (X[1]) axis

对于第一次拆分,请注意总基尼现在为0.448*1512/2000 + 0.0 * 488/2000 =0.34<0.5.拆分后的精度大约为75%,因为它恰好位于数据25%100%上,并且恰好位于数据75%66%上.

For the first split, notice that total gini is now 0.448*1512/2000 + 0.0 * 488/2000 =0.34<0.5. The accuracy after that split is about 75%, because it is right on 100% of 25% of the data, and 66% on 75% of the data.

这篇关于在决策树中拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:47