我正在尝试对贷款状态数据进行重新编码,以使每个观察结果均为“默认”或“全额付款”。具体来说,我想将任何人重新编码!='Fully Paid'为'Default'。
这是我的价值观:
df.loan_status.unique()
array(['Fully Paid', 'Charged Off', 'Default', 'Late (31-120 days)',
'In Grace Period', 'Late (16-30 days)',
'Does not meet the credit policy. Status:Fully Paid',
'Does not meet the credit policy. Status:Charged Off', 'Issued'], dtype=object)
我尝试了以下代码,但是所有观察结果都重新编码为“默认”:
statuses= df['loan_status'].unique()
for status in statuses:
if status!='Fully Paid':
df['loan_status']='Default'
任何有关如何执行此操作的建议将不胜感激!
最佳答案
我喜欢这种方法。
Andras Deak / MaxU;选项1
df.loc[df.loan_status.ne('Fully Paid'), 'loan_status'] = 'Default'
选项2
pd.Series.where
ls = df.loan_status
df.update(ls.where(ls.eq('Fully Paid'), 'Default'))
选项3
pd.Series.mask
ls = df.loan_status
df.update(ls.mask(ls.ne('Fully Paid')).fillna('Default'))
选项4
numpy.where
ls = df.loan_status.values
paid, dflt = 'Fully Paid', 'Default'
df.loc[:, 'loan_status'] = np.where(ls == paid, paid, dflt)