



我正在尝试以下代码,发现 sklearn 中的 StandardScaler(或 MinMaxScaler)Normalizer 处理数据的方式非常不同.这个问题使管道建设更加困难.我想知道这种设计差异是否是故意的.

I was trying the following code and found that StandardScaler(or MinMaxScaler) and Normalizer from sklearn handle data very differently. This issue makes the pipeline construction more difficult. I was wondering if this design discrepancy is intentional or not.

from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler


For Normalizer, the data is read "horizontally".

Normalizer(norm = 'max').fit_transform([[ 1., 1.,  2., 10],
                                        [ 2.,  0.,  0., 100],
                                        [ 0.,  -1., -1., 1000]])
#array([[ 0.1  ,  0.1  ,  0.2  ,  1.   ],
#       [ 0.02 ,  0.   ,  0.   ,  1.   ],
#       [ 0.   , -0.001, -0.001,  1.   ]])


For StandardScaler and MinMaxScaler, the data is read "vertically".

StandardScaler().fit_transform([[ 1., 1.,  2., 10],
                                [ 2.,  0.,  0., 100],
                                [ 0.,  -1., -1., 1000]])
#array([[ 0.        ,  1.22474487,  1.33630621, -0.80538727],
#       [ 1.22474487,  0.        , -0.26726124, -0.60404045],
#       [-1.22474487, -1.22474487, -1.06904497,  1.40942772]])

MinMaxScaler().fit_transform([[ 1., 1.,  2., 10],
                              [ 2.,  0.,  0., 100],
                              [ 0.,  -1., -1., 1000]])
#array([[0.5       , 1.        , 1.        , 0.        ],
#       [1.        , 0.5       , 0.33333333, 0.09090909],
#       [0.        , 0.        , 0.        , 1.        ]])


这是预期的行为,因为 StandardScalerNormalizer 用于不同的目的.StandardScaler 有效垂直",因为它...

This is expected behavior, because StandardScaler and Normalizer serve different purposes. The StandardScaler works 'vertically', because it...



[...] Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.


while the Normalizer works 'horizontally', because it...

将 [s] 个样本单独标准化为单位范数.

具有至少一个非零分量的每个样本(即数据矩阵的每一行)独立于其他样本进行重新缩放,使其范数(l1 或 l2)等于 1.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

请查看 scikit-learn 文档(上面的链接),以获得更多见解,从而更好地满足您的目的.

Please have a look at the scikit-learn docs (links above), to get more insight, which serves your purpose better.


08-20 09:09