问题描述
我想应用 scikit-learn
提供的缩放 sklearn.preprocessing.scale
模块来居中我将用来训练 svm 分类器的数据集.
I want to apply the scaling sklearn.preprocessing.scale
module that scikit-learn
offers for centering a dataset that I will use to train an svm classifier.
然后我如何存储标准化参数,以便我也可以将它们应用于我想要分类的数据?
How can I then store the standardization parameters so that I can also apply them to the data that I want to classify?
我知道我可以使用 standarScaler
但是我可以以某种方式将它序列化到一个文件中,这样我就不必每次运行分类器时都将它与我的数据匹配吗?
I know I can use the standarScaler
but can I somehow serialize it to a file so that I wont have to fit it to my data every time I want to run the classifier?
推荐答案
我认为最好的方法是在 fit
后处理它,因为这是最通用的选项.也许您稍后会创建一个由特征提取器和缩放器组成的管道.通过酸洗(可能是复合的)阶段,您可以使事情变得更加通用.sklearn 关于模型持久性的文档 讨论了如何做到这一点.
I think that the best way is to pickle it post fit
, as this is the most generic option. Perhaps you'll later create a pipeline composed of both a feature extractor and scaler. By pickling a (possibly compound) stage, you're making things more generic. The sklearn documentation on model persistence discusses how to do this.
说了这么多,你可以查询sklearn.preprocessing.StandardScaler
用于拟合参数:
Having said that, you can query sklearn.preprocessing.StandardScaler
for the fit parameters:
scale_:ndarray,形状(n_features,)每个特征数据的相对缩放.0.17 新版功能:推荐使用 scale_ 而不是不推荐使用的 std_.mean_ :形状为 [n_features] 的浮点数组训练集中每个特征的平均值.
以下简短片段说明了这一点:
The following short snippet illustrates this:
from sklearn import preprocessing
import numpy as np
s = preprocessing.StandardScaler()
s.fit(np.array([[1., 2, 3, 4]]).T)
>>> s.mean_, s.scale_
(array([ 2.5]), array([ 1.11803399]))
这篇关于如何存储缩放参数以备后用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!