我一直试图将测试数据随机分为测试集和训练集,并在5层深的决策树上进行训练,并绘制决策树。
P.s.我不允许使用熊猫这样做。
这是我尝试做的事情:
import numpy
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn.model_selection import train_test_split
filename = 'diabetes.csv'
raw_data = open(filename, 'rt')
data = numpy.loadtxt(raw_data, delimiter=",", skiprows=1)
print(data.shape)
X = data[:,0:8] #identify columns as data sets
Y = data[:, 9] #identfy last column as target
print(X)
print(Y)
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size=0.25)
treeClassifier = DecisionTreeClassifier(max_depth=5)
treeClassifier.fit(X_train, Y_train)
with open("treeClassifier.txt", "w") as f:
f = tree.export_graphviz(treeClassifier, out_file=f)
我的输出是:
(768, 10)
[[ 6. 148. 72. ... 33.6 0.627 50. ]
[ 1. 85. 66. ... 26.6 0.351 31. ]
[ 8. 183. 64. ... 23.3 0.672 32. ]
...
[ 5. 121. 72. ... 26.2 0.245 30. ]
[ 1. 126. 60. ... 30.1 0.349 47. ]
[ 1. 93. 70. ... 30.4 0.315 23. ]]
[1. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 1. 0. 1. 1. 1. 1. 1. 0. 1. 0. 0. 1. 1.
1. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 1. 0. 0.
1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0.
1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0.
0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 0.
1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0.
0. 0. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0.
1. 1. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 1. 1.
1. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 1. 1. 1. 1. 0.
0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 1. 0. 0.
1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1.
0. 0. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 1. 1. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0.
1. 0. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 0.
0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1.
1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0.
0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 1. 0. 1. 0. 1. 0.
1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 0.
0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1.
0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0.
0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
0. 1. 0. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0.
0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1.
1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 1. 1. 1. 1. 0. 1. 1. 0. 0. 0. 0.
0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 1. 0. 1.
1. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1.
0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1.
0. 0. 1. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 1. 0.]
这是我希望生成的树看起来像的示例:
我遇到的问题是在我的树中,我没有得到'class = 0 \ class = 1'属性。我认为问题可能出在
Y = data[:, 9]
部分,第9列对它是0还是1进行了分类-这是class属性,但是我看不到有任何方法可以对其进行更改以使其出现在树中;也许在tree.export_graphviz
函数中?我是否缺少参数?任何帮助,将不胜感激。 最佳答案
如果您更换
tree.export_graphviz(treeClassifier, out_file=f)
与
tree.export_graphviz(treeClassifier, class_names=['0', '1'], out_file=f)
你应该很好。
例如,
import graphviz
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.model_selection import train_test_split
np.random.seed(42)
X = np.random.random((100, 8))
Y = np.random.randint(2, size=100)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25)
tree_classifier = DecisionTreeClassifier(max_depth=5)
tree_classifier.fit(X_train, Y_train)
dot_data = tree.export_graphviz(tree_classifier, class_names=['0', '1'], out_file=None)
graph = graphviz.Source(dot_data)
graph
为了使它看起来更像您引用的示例,可以使用
tree.export_graphviz(treeClassifier, class_names=['0', '1'],
filled=True, rounded=True, out_file=f)
关于python - 在scikit-learn中的数据集上绘制决策树,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51912370/