问题描述
我想按特定列对DataFrame进行分组,然后应用sklearn预处理MinMaxScaler并存储缩放器对象.
I want to group my DataFrame by specific column and then apply a sklearn preprocessing MinMaxScaler and store the scaler object.
我现在的起点:
import pandas as pd
from sklearn import preprocessing
scaler = {}
groups = df.groupby('ID')
for name, group in groups:
scr = preprocessing.MinMaxScaler()
scr.fit(group)
scaler.update({name: scr})
group = scr.transform(group)
使用df.groupby('ID').transform
可以吗?
更新
来自我的原始DataFrame
From my original DataFrame
pd.DataFrame( dict( ID=list('AAABBB'),
VL=(0,10,10,100,100,200))
我想根据ID缩放所有列.在此示例中:
I want to scale all columns based on ID. In this example:
A 0.0
A 1.0
A 1.0
B 0.0
B 0.0
B 1.0
带有信息/缩放器对象(已通过fit初始化)
with the information / scaler object (initialized with fit)
preprocessing.MinMaxScaler().fit( ... )
推荐答案
您可以从一个方向进行操作:
you can do it in one direction:
In [62]: from sklearn.preprocessing import minmax_scale
In [63]: df
Out[63]:
ID VL SC
0 A 0 0
1 A 10 1
2 A 10 1
3 B 100 0
4 B 100 0
5 B 200 1
In [64]: df['SC'] = df.groupby('ID').VL.transform(lambda x: minmax_scale(x.astype(float)))
In [65]: df
Out[65]:
ID VL SC
0 A 0 0
1 A 10 1
2 A 10 1
3 B 100 0
4 B 100 0
5 B 200 1
,但您不会厌烦使用inverse_transform
,因为每次调用MinMaxScaler
(针对每个组或每个ID
)都会覆盖有关原始功能的信息...
but you will not be anle to use inverse_transform
as each call of MinMaxScaler
(for each group or each ID
) will overwrite the information about your orginal features...
这篇关于 pandas groupby结合sklearn预处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!