


我有一个数据集,它是一个大型 JSON 文件.我读取它并将其存储在 trainList 变量中.

I have a dataset which is a large JSON file. I read it and store it in the trainList variable.

接下来,我对其进行预处理 - 为了能够使用它.

Next, I pre-process it - in order to be able to work with it.


  1. 我使用 kfold 交叉验证方法来获得均值准确率并训练分类器.
  2. 我进行预测并获得准确度&该折叠的混淆矩阵.
  3. 在此之后,我想获得True Positive(TP)True Negative(TN)False Positive(FP)False Negative(FN) 值.我将使用这些参数来获得灵敏度特异性.
  1. I use the kfold cross validation method in order to obtain the meanaccuracy and train a classifier.
  2. I make the predictions and obtain the accuracy & confusion matrix of that fold.
  3. After this, I would like to obtain the True Positive(TP), True Negative(TN), False Positive(FP) and False Negative(FN) values. I'll use these parameters to obtain the Sensitivity and Specificity.

最后,我会用它来放入 HTML 以显示带有每个标签的 TP 的图表.

Finally, I would use this to put in HTML in order to show a chart with the TPs of each label.



trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data


#I transform the data from JSON form to a numerical one

#I scale the matrix (don't know why but without it, it makes an error)

#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)

#I start the cross validation
for train_indices, test_indices in kf:
    X_train=[X[ii] for ii in train_indices]
    X_test=[X[ii] for ii in test_indices]
    y_train=[listaLabels[ii] for ii in train_indices]
    y_test=[listaLabels[ii] for ii in test_indices]

    #I train the classifier

    #I make the predictions

    #I obtain the accuracy of this fold

    #I obtain the confusion matrix
    cm=confusion_matrix(y_test, predicted)

    #I should calculate the TP,TN, FP and FN
    #I don't know how to continue


如果您有两个列表,分别具有预测值和实际值;正如您所做的那样,您可以将它们传递给一个函数,该函数将使用以下内容计算 TP、FP、TN、FN:

If you have two lists that have the predicted and actual values; as it appears you do, you can pass them to a function that will calculate TP, FP, TN, FN with something like this:

def perf_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)):
        if y_actual[i]==y_hat[i]==1:
           TP += 1
        if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
           FP += 1
        if y_actual[i]==y_hat[i]==0:
           TN += 1
        if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
           FN += 1

    return(TP, FP, TN, FN)


From here I think you will be able to calculate rates of interest to you, and other performance measure like specificity and sensitivity.


