如何存储缩放参数以备后用 | 如何存储缩放参数

本文介绍了如何存储缩放参数以备后用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想应用 scikit-learn 提供的缩放 sklearn.preprocessing.scale 模块来居中我将用来训练 svm 分类器的数据集.

I want to apply the scaling sklearn.preprocessing.scale module that scikit-learn offers for centering a dataset that I will use to train an svm classifier.

然后我如何存储标准化参数，以便我也可以将它们应用于我想要分类的数据?

How can I then store the standardization parameters so that I can also apply them to the data that I want to classify?

我知道我可以使用 standarScaler 但是我可以以某种方式将它序列化到一个文件中，这样我就不必每次运行分类器时都将它与我的数据匹配吗?

I know I can use the standarScaler but can I somehow serialize it to a file so that I wont have to fit it to my data every time I want to run the classifier?

推荐答案

我认为最好的方法是在 fit 后处理它，因为这是最通用的选项.也许您稍后会创建一个由特征提取器和缩放器组成的管道.通过酸洗(可能是复合的)阶段，您可以使事情变得更加通用.sklearn 关于模型持久性的文档讨论了如何做到这一点.

I think that the best way is to pickle it post fit, as this is the most generic option. Perhaps you'll later create a pipeline composed of both a feature extractor and scaler. By pickling a (possibly compound) stage, you're making things more generic. The sklearn documentation on model persistence discusses how to do this.

说了这么多，你可以查询sklearn.preprocessing.StandardScaler 用于拟合参数:

Having said that, you can query sklearn.preprocessing.StandardScaler for the fit parameters:

scale_:ndarray，形状(n_features，)每个特征数据的相对缩放.0.17 新版功能:推荐使用 scale_ 而不是不推荐使用的 std_.mean_ :形状为 [n_features] 的浮点数组训练集中每个特征的平均值.

以下简短片段说明了这一点:

The following short snippet illustrates this:

from sklearn import preprocessing
import numpy as np

s = preprocessing.StandardScaler()
s.fit(np.array([[1., 2, 3, 4]]).T)
>>> s.mean_, s.scale_
(array([ 2.5]), array([ 1.11803399]))