本文介绍了决策树中特定类别的Sklearn决策规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建决策树.我的数据属于以下类型

  X1 | X2 | X3 | ..... X50 | Y_____________________________________1 | 5 | 7 | ..... 0 | 11.5 | 34 | 81 | ..... 0 | 14 | 21 | 21 | .... 1 | 065 | 34 | 23 | ..... 1 | 1 

我正在尝试执行以下代码:

  X_train = data.iloc [:,0:51]Y_train = data.iloc [:,51]clf = DecisionTreeClassifier(criterion ="entropy",random_state = 100,max_depth = 8,min_samples_leaf = 15)clf.fit(X_train,y_train) 

我想要的是决定特定类别的决策规则(本例中为"0").例如,

 当X1>4&&X5>78&&X50 = 100然后Y = 0(概率= 84%)当X4 = 56&&X39<100然后Y = 0(概率= 93%)... 

因此,基本上,我希望所有叶节点,附加的决策规则以及Y = 0的概率到来,从而预测Class Y ="0".我也想以上述指定格式打印那些决策规则./p>

我对预测(Y = 1)的决策规则不感兴趣

谢谢,任何帮助将不胜感激

解决方案

基于

创建用于为一个实例打印条件的函数:

  node_indicator = clf.decision_path(Xtrain)n_nodes = clf.tree_.node_count功能= clf.tree_.feature阈值= clf.tree_.thresholdLeave_id = clf.apply(Xtrain)def value2prob(值):返回值/value.sum(axis = 1).reshape(-1,1)def print_condition(sample_id):print("WHEN",end ='')node_index = node_indicator.indices [node_indicator.indptr [sample_id]:node_indicator.indptr [sample_id + 1]]对于n,enumerate(node_index)中的node_id:如果Leave_id [sample_id] == node_id:值= clf.tree_.value [node_id]概率= value2prob(值)print('THEN Y = {}(概率= {})(values = {})'.format(probs.argmax(),probs.max(),值))继续如果n>0:print('&&',end ='')如果(Xtrain [sample_id,feature [node_id]]< =阈值[node_id]):threshold_sign =< ="别的:threshold_sign =>"如果feature [node_id]!= _tree.TREE_UNDEFINED:打印(%s%s%s"%(feature_names [feature [node_id]],#Xtrain [sample_id,feature [node_id]]#实际值threshold_sign,阈值[node_id]),end ='') 

在第一行调用它:

 >>>print_condition(0)当X1>-0.2662498950958252&&X0-1.1966443061828613 THEN Y = 1(概率= 0.9672131147540983)(值= [[2. 59.]]) 

在预测值为零的所有行上调用它:

  [在(clf.predict(Xtrain)== 0).nonzero()[0]]中用于i的print_condition(i) 

I am creating a decision tree.My data is of the following type

X1 |X2 |X3|.....X50|Y
_____________________________________
1  |5  |7 |.....0  |1
1.5|34 |81|.....0  |1
4  |21 |21|.... 1  |0
65 |34 |23|.....1  |1

I am trying following code to execute:

X_train = data.iloc[:,0:51]
Y_train = data.iloc[:,51]
clf = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
                           max_depth=8, min_samples_leaf=15)
clf.fit(X_train, y_train)

What I want i decision rules which predict the specific class(In this case "0").For Example,

when X1 > 4 && X5> 78 && X50 =100 Then Y = 0 ( Probability =84%)
When X4 = 56 && X39 < 100 Then Y = 0 ( Probability = 93%)
...

So basically I want all the leaf nodes,decision rules attached to them and probability of Y=0 coming,those predict the Class Y = "0".I also want to print those decision rules in the above specified format.

I am not interested in the decision rules which predict (Y=1)

Thanks, Any help would be appreciated

解决方案

Based on http://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html

Assuming that probabilities equal to proportion of classes in each node, e.g.if leaf holds 68 instances with class 0 and 15 with class 1 (i.e. value in tree_ is [68,15]) probabilities are [0.81927711, 0.18072289].

Generarate a simple tree, 4 features, 2 classes:

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.cross_validation import train_test_split
from sklearn.tree import _tree

X, y = make_classification(n_informative=3, n_features=4, n_samples=200, n_redundant=1, random_state=42, n_classes=2)
feature_names = ['X0','X1','X2','X3']
Xtrain, Xtest, ytrain, ytest = train_test_split(X,y, random_state=42)
clf = DecisionTreeClassifier(max_depth=2)
clf.fit(Xtrain, ytrain)

Visualize it:

from sklearn.externals.six import StringIO
from sklearn import tree
import pydot
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue()) [0]
graph.write_jpeg('1.jpeg')

Create a function for printing a condition for one instance:

node_indicator = clf.decision_path(Xtrain)
n_nodes = clf.tree_.node_count
feature = clf.tree_.feature
threshold = clf.tree_.threshold
leave_id = clf.apply(Xtrain)


def value2prob(value):
    return value / value.sum(axis=1).reshape(-1, 1)


def print_condition(sample_id):
    print("WHEN", end=' ')
    node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
                                        node_indicator.indptr[sample_id + 1]]
    for n, node_id in enumerate(node_index):
        if leave_id[sample_id] == node_id:
            values = clf.tree_.value[node_id]
            probs = value2prob(values)
            print('THEN Y={} (probability={}) (values={})'.format(
                probs.argmax(), probs.max(), values))
            continue
        if n > 0:
            print('&& ', end='')
        if (Xtrain[sample_id, feature[node_id]] <= threshold[node_id]):
            threshold_sign = "<="
        else:
            threshold_sign = ">"
        if feature[node_id] != _tree.TREE_UNDEFINED:
            print(
                "%s %s %s" % (
                    feature_names[feature[node_id]],
                    #Xtrain[sample_id,feature[node_id]] # actual value
                    threshold_sign,
                    threshold[node_id]),
                end=' ')

Call it on the first row:

>>> print_condition(0)
WHEN X1 > -0.2662498950958252 && X0 > -1.1966443061828613 THEN Y=1 (probability=0.9672131147540983) (values=[[ 2. 59.]])

Call it on all rows where predicted value is zero:

[print_condition(i) for i in (clf.predict(Xtrain) == 0).nonzero()[0]]

这篇关于决策树中特定类别的Sklearn决策规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-01 08:25