本文介绍了获取TypeError:尝试使用idxmax()时,不允许对此dtype进行归约运算'argmax'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Pandas中使用idxmax()函数时,我一直收到此错误.

When using the idxmax() function in Pandas, I keep receiving this error.

Traceback (most recent call last):
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/main.py", line 20, in <module>
    best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled)
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/Classification.py", line 39, in print_kfold_scores
    best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py", line 1369, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

我正在使用的Pandas版本是0.22.0

The Pandas version I am using is 0.22.0

main.py

import ExploratoryDataAnalysis as eda
import Preprocessing as processor
import Classification as classify
import pandas as pd


data_path = '/Users/username/college/year-4/fyp-credit-card-fraud/data/'

if __name__ == '__main__':
    df = pd.read_csv(data_path + 'creditcard.csv')
    # eda.init(df)
    # eda.check_null_values()
    # eda.view_data()
    # eda.check_target_classes()
    df = processor.noramlize(df)

    X_training, X_testing, y_training, y_testing, X_training_undersampled, X_testing_undersampled, \
    y_training_undersampled, y_testing_undersampled = processor.resample(df)

    best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled)

Classification.py

Classification.py

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.metrics import confusion_matrix, precision_recall_curve, auc, \
    roc_auc_score, roc_curve, recall_score, classification_report
import pandas as pd
import numpy as np


def print_kfold_scores(X_training, y_training):
    print('\nKFold\n')

    fold = KFold(len(y_training), 5, shuffle=False)

    c_param_range = [0.01, 0.1, 1, 10, 100]

    results = pd.DataFrame(index=range(len(c_param_range), 2), columns=['C_parameter', 'Mean recall score'])
    results['C_parameter'] = c_param_range

    j = 0
    for c_param in c_param_range:
        print('-------------------------------------------')
        print('C parameter: ', c_param)
        print('\n-------------------------------------------')

        recall_accs = []
        for iteration, indices in enumerate(fold, start=1):
            lr = LogisticRegression(C=c_param, penalty='l1')
            lr.fit(X_training.iloc[indices[0], :], y_training.iloc[indices[0], :].values.ravel())

            y_prediction_undersampled = lr.predict(X_training.iloc[indices[1], :].values)
            recall_acc = recall_score(y_training.iloc[indices[1], :].values, y_prediction_undersampled)
            recall_accs.append(recall_acc)
            print('Iteration ', iteration, ': recall score = ', recall_acc)

        results.ix[j, 'Mean recall score'] = np.mean(recall_accs)
        j += 1
        print('\nMean recall score ', np.mean(recall_accs))
        print('\n')

    best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter'] # Error occurs on this line

    print('*****************************************************************')
    print('Best model to choose from cross validation is with C parameter = ', best_c_param)
    print('*****************************************************************')

    return best_c_param

引起问题的行是

best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']

程序的输出如下

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/username/College/year-4/fyp-credit-card-fraud/code/main.py
/Users/username/Library/Python/3.6/lib/python/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Dataset Ratios

Percentage of genuine transactions:  0.5
Percentage of fraudulent transactions 0.5
Total number of transactions in resampled data:  984


Whole Dataset Split

Number of transactions in training dataset:  199364
Number of transactions in testing dataset:  85443
Total number of transactions in dataset:  284807


Undersampled Dataset Split

Number of transactions in training dataset 688
Number of transactions in testing dataset:  296
Total number of transactions in dataset:  984

KFold

-------------------------------------------
C parameter:  0.01

-------------------------------------------
Iteration  1 : recall score =  0.931506849315
Iteration  2 : recall score =  0.917808219178
Iteration  3 : recall score =  1.0
Iteration  4 : recall score =  0.959459459459
Iteration  5 : recall score =  0.954545454545

Mean recall score  0.9526639965


-------------------------------------------
C parameter:  0.1

-------------------------------------------
Iteration  1 : recall score =  0.849315068493
Iteration  2 : recall score =  0.86301369863
Iteration  3 : recall score =  0.915254237288
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.909090909091

Mean recall score  0.89652397189


-------------------------------------------
C parameter:  1

-------------------------------------------
Iteration  1 : recall score =  0.86301369863
Iteration  2 : recall score =  0.86301369863
Iteration  3 : recall score =  0.983050847458
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.924242424242

Mean recall score  0.915853322981


-------------------------------------------
C parameter:  10

-------------------------------------------
Iteration  1 : recall score =  0.849315068493
Iteration  2 : recall score =  0.876712328767
Iteration  3 : recall score =  0.983050847458
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.939393939394

Mean recall score  0.918883626012


-------------------------------------------
C parameter:  100

-------------------------------------------
Iteration  1 : recall score =  0.86301369863
Iteration  2 : recall score =  0.876712328767
Iteration  3 : recall score =  0.983050847458
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.924242424242

Mean recall score  0.918593049009


Traceback (most recent call last):
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/main.py", line 20, in <module>
    best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled)
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/Classification.py", line 39, in print_kfold_scores
    best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py", line 1369, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

Process finished with exit code 1

推荐答案

#best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter']

我们应该替换此行代码

主要问题:

1)平均回忆得分"的类型是对象,不能使用"idxmax()"来计算值2)您应该将平均回忆得分"从对象"更改为浮动"3)您可以使用apply(pd.to_numeric,errors ='coerce',axis = 0)来执行此类操作.

We should replace this line of code

The main problem:

1) the type of "mean recall score" is object, you can't use "idxmax()" to calculate the value 2) you should change "mean recall score" from "object " to "float" 3) you can use apply(pd.to_numeric, errors = 'coerce', axis = 0) to do such things.

best_c = results_table
best_c.dtypes.eq(object) # you can see the type of best_c
new = best_c.columns[best_c.dtypes.eq(object)] #get the object column of the best_c
best_c[new] = best_c[new].apply(pd.to_numeric, errors = 'coerce', axis=0) # change the type of object
best_c
best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter'] #calculate the mean values

这篇关于获取TypeError:尝试使用idxmax()时,不允许对此dtype进行归约运算'argmax'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 08:40