这是我的代码。我正在尝试对数据集进行归一化,但是我可以看到输出未在0到1之间缩放。我是否在这里缺少内容?
由于此代码适用于虹膜数据集。归一化不是总是返回0到1之间的缩放值吗?
# Normalize the data attributes for the boston dataset.
from sklearn.datasets import load_boston
from sklearn import preprocessing
# load the iris dataset
dataset = load_boston()
print(iris.data.shape)
# separate the data from the target attributes
X = dataset.data
y = dataset.target
# normalize the data attributes
normalized_X = preprocessing.normalize(X)
normalized_X[:5]
输出:
array([[1.26388341e-05, 3.59966795e-02, 4.61957387e-03, 0.00000000e+00,
1.07590075e-03, 1.31487871e-02, 1.30387972e-01, 8.17924550e-03,
1.99981553e-03, 5.91945396e-01, 3.05971776e-02, 7.93726783e-01,
9.95908132e-03],
[5.78529889e-05, 0.00000000e+00, 1.49769546e-02, 0.00000000e+00,
9.93520754e-04, 1.36021253e-02, 1.67140272e-01, 1.05222110e-02,
4.23676228e-03, 5.12648235e-01, 3.77071843e-02, 8.40785474e-01,
1.93620036e-02],
[5.85729947e-05, 0.00000000e+00, 1.51744622e-02, 0.00000000e+00,
1.00662274e-03, 1.54212886e-02, 1.31139977e-01, 1.06609718e-02,
4.29263427e-03, 5.19408747e-01, 3.82044450e-02, 8.43137761e-01,
8.64965806e-03],
[7.10489715e-05, 0.00000000e+00, 4.78488594e-03, 0.00000000e+00,
1.00526503e-03, 1.53599229e-02, 1.00526503e-01, 1.33059337e-02,
6.58470542e-03, 4.87268201e-01, 4.10446638e-02, 8.66174100e-01,
6.45301131e-03],
[1.50596596e-04, 0.00000000e+00, 4.75453408e-03, 0.00000000e+00,
9.98888353e-04, 1.55874565e-02, 1.18209058e-01, 1.32215305e-02,
6.54293681e-03, 4.84177324e-01, 4.07843061e-02, 8.65630540e-01,
1.16246177e-02]])
最佳答案
为什么说值不在0到1之间?
规范化并不意味着min=0
和max=1
...,这意味着将对每个非零向量进行缩放,以使其范数(默认为L2范数)为1。
换句话说,对于每个向量,每个坐标的平方和为1。
例如,考虑您的最后一个向量,我们可以看到
In [1]: x = [1.50596596e-04, 0.00000000e+00, 4.75453408e-03, 0.00000000e+00,
...: 9.98888353e-04, 1.55874565e-02, 1.18209058e-01, 1.32215305e-02,
...: 6.54293681e-03, 4.84177324e-01, 4.07843061e-02, 8.65630540e-01,
...: 1.16246177e-02]
In [2]: sum(c**2 for c in x)
Out[2]: 0.9999999993530653
In [3]: