我正在使用以下代码来获得1级概率。
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
clf=RandomForestClassifier(n_estimators=10, random_state = 42, class_weight="balanced")
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
proba = cross_val_predict(clf, X, y, cv=k_fold, method='predict_proba')
#print probability of class 1
print(proba[:,1])
我的结果如下。
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0.2 0. 0. 0. 0. 0.1 0. 0. 0. 0. 0. 0. 0. 0. 0.9 1. 0.7 1.
1. 1. 1. 0.7 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.9 0.9 0.1 1.
0.6 1. 1. 1. 0.9 0. 1. 1. 1. 1. 1. 0.4 0.9 0.9 1. 1. 1. 0.9
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.9 0.
0.1 0. 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0.8 0. 0.1 0. 0.1 0. 0.1
0.3 0.2 0. 0.6 0. 0. 0. 0.6 0.4 0. 0. 0. 0.8 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. ]
但是,这只是概率列表,难以解释结果。
假设,我也为每个
data point in the
iris数据集都有一个名称列表,如下所示(Iris数据集有150个数据点)。iris_names = ['iris_0', 'iris_1', 'iris_2', 'iris_3', 'iris_4', 'iris_5', 'iris_6', 'iris_7', 'iris_8', 'iris_9', 'iris_10', 'iris_11', 'iris_12', 'iris_13', 'iris_14', 'iris_15', 'iris_16', 'iris_17', 'iris_18', 'iris_19', 'iris_20', 'iris_21', 'iris_22', 'iris_23', 'iris_24', 'iris_25', 'iris_26', 'iris_27', 'iris_28', 'iris_29', 'iris_30', 'iris_31', 'iris_32', 'iris_33', 'iris_34', 'iris_35', 'iris_36', 'iris_37', 'iris_38', 'iris_39', 'iris_40', 'iris_41', 'iris_42', 'iris_43', 'iris_44', 'iris_45', 'iris_46', 'iris_47', 'iris_48', 'iris_49', 'iris_50', 'iris_51', 'iris_52', 'iris_53', 'iris_54', 'iris_55', 'iris_56', 'iris_57', 'iris_58', 'iris_59', 'iris_60', 'iris_61', 'iris_62', 'iris_63', 'iris_64', 'iris_65', 'iris_66', 'iris_67', 'iris_68', 'iris_69', 'iris_70', 'iris_71', 'iris_72', 'iris_73', 'iris_74', 'iris_75', 'iris_76', 'iris_77', 'iris_78', 'iris_79', 'iris_80', 'iris_81', 'iris_82', 'iris_83', 'iris_84', 'iris_85', 'iris_86', 'iris_87', 'iris_88', 'iris_89', 'iris_90', 'iris_91', 'iris_92', 'iris_93', 'iris_94', 'iris_95', 'iris_96', 'iris_97', 'iris_98', 'iris_99', 'iris_100', 'iris_101', 'iris_102', 'iris_103', 'iris_104', 'iris_105', 'iris_106', 'iris_107', 'iris_108', 'iris_109', 'iris_110', 'iris_111', 'iris_112', 'iris_113', 'iris_114', 'iris_115', 'iris_116', 'iris_117', 'iris_118', 'iris_119', 'iris_120', 'iris_121', 'iris_122', 'iris_123', 'iris_124', 'iris_125', 'iris_126', 'iris_127', 'iris_128', 'iris_129', 'iris_130', 'iris_131', 'iris_132', 'iris_133', 'iris_134', 'iris_135', 'iris_136', 'iris_137', 'iris_138', 'iris_139', 'iris_140', 'iris_141', 'iris_142', 'iris_143', 'iris_144', 'iris_145', 'iris_146', 'iris_147', 'iris_148', 'iris_149']
现在,我想对类1的
cross_val_predict
结果进行排序,并将其与iris names
相加。因此,我的预期输出如下。
sorted_probability_of_class_1 = [[iris_xxx, 1], [iris_xxx, 1], ........, [iris_xxx, 0.9], [iris_xxx, 0.8], ........, [iris_xxx, 0], [iris_xxx, 0]]
我该怎么做?
cross_val_predict
中的概率是否与原始数据点的顺序相同?如果需要,我很乐意提供更多详细信息。
最佳答案
使用zip()
将两个列表合并为一个:
sorted_probability_of_class_1 = zip(proba[:, 1], iris_names)
您可能需要先使用
proba
将list(proba)
转换为列表。这是zip
方法更易读的示例:>>> probabilities = [1, 2, 3, 0]
>>> labels = ['a', 'b', 'c', 'd']
>>> list(zip(labels, probabilities))
[('a', 1), ('b', 2), ('c', 3), ('d', 0)]
可以使用
sorted(iterable, key)
和itemgetter
对压缩列表进行排序:>>> from operator import itemgetter
>>> merged_list = list(zip(labels, probabilities))
>>> merged_list
[('a', 1), ('b', 2), ('c', 3), ('d', 0)]
>>> sorted(merged_list, key=itemgetter(1))
[('d', 0), ('a', 1), ('b', 2), ('c', 3)]
itemgetter(1)
访问元组列表中元组的第二个元素。这可能需要根据您的工作代码进行调整。