我有一个名为x的变量,最小值为零,最大值为200万。所以我将值切成这样的代码

df_input['X_bins'] = pd.cut(df_input['X'], bins,right=False) ```
[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/LYPTl.png

For current time i am using for loop to replace each bins with their Weight of evidence value.

```def flag_dfstd(df_input):
    if (df_input['X'] >=0) & (df_input['X'] <100) :
        return '-0.157688'
    elif (df_input['X'] >=100) & (df_input['X'] < 10000) :
        return '-0.083307'
    elif (df_input['X'] >=10000) & (df_input['X'] < 20000) :
        return '0.381819'
    elif (df_input['X'] >=20000) & (df_input['X'] < 50000):
        return '0.364365'
    else:
        return '0'
df_input['X_WOE'] = df_input.apply(flag_dfstd, axis = 1).astype(str) ```

Is there way that i can replace the weight of evidence with out using for loop

最佳答案

我认为您需要使用参数cutlabels,并且要替换丢失的值是必需的,请在替换前添加

df_input = pd.DataFrame({'X':[0,20,100, 10000, 30000, 1000000]})

b = [-np.inf, 100, 10000, 20000, 50000]
l = ['-0.157688', '-0.083307', '0.381819', '0.364365']

df_input['X_WOE'] = pd.cut(df_input['X'], bins=b, labels=l,right=False)
df_input['X_WOE'] = df_input['X_WOE'].cat.add_categories(['0']).fillna('0')
print (df_input)
         X      X_WOE
0        0  -0.157688
1       20  -0.157688
2      100  -0.083307
3    10000   0.381819
4    30000   0.364365
5  1000000          0

10-12 22:40