我有一个名为x的变量,最小值为零,最大值为200万。所以我将值切成这样的代码
df_input['X_bins'] = pd.cut(df_input['X'], bins,right=False) ```
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/LYPTl.png
For current time i am using for loop to replace each bins with their Weight of evidence value.
```def flag_dfstd(df_input):
if (df_input['X'] >=0) & (df_input['X'] <100) :
return '-0.157688'
elif (df_input['X'] >=100) & (df_input['X'] < 10000) :
return '-0.083307'
elif (df_input['X'] >=10000) & (df_input['X'] < 20000) :
return '0.381819'
elif (df_input['X'] >=20000) & (df_input['X'] < 50000):
return '0.364365'
else:
return '0'
df_input['X_WOE'] = df_input.apply(flag_dfstd, axis = 1).astype(str) ```
Is there way that i can replace the weight of evidence with out using for loop
最佳答案
我认为您需要使用参数cut
的labels
,并且要替换丢失的值是必需的,请在替换前添加:df_input = pd.DataFrame({'X':[0,20,100, 10000, 30000, 1000000]})
b = [-np.inf, 100, 10000, 20000, 50000]
l = ['-0.157688', '-0.083307', '0.381819', '0.364365']
df_input['X_WOE'] = pd.cut(df_input['X'], bins=b, labels=l,right=False)
df_input['X_WOE'] = df_input['X_WOE'].cat.add_categories(['0']).fillna('0')
print (df_input)
X X_WOE
0 0 -0.157688
1 20 -0.157688
2 100 -0.083307
3 10000 0.381819
4 30000 0.364365
5 1000000 0
关于python - 用值(value)代替证据的权重,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59233409/