莫烦scikit-learn学习自修第六天【特征值矩阵标准化】

1.代码实战

#!/usr/bin/env python
#!_*_coding:UTF-8 _*_

import numpy as np
from sklearn import preprocessing
from sklearn.cross_validation import train_test_split
from sklearn.datasets.samples_generator import make_classification
from sklearn.svm import SVC
import matplotlib.pyplot as plt

# 生成样本数据
X, y = make_classification(n_samples=300, # 生层300条训练数据
                           n_features=2, # 生成两个特征值
                           n_redundant=0,
                           n_informative=2, # 特征值中有两个是相关的
                           random_state=22, # 每次运行该脚本生成的数据是一样的
                           n_clusters_per_class=1,
                           scale=100)

# 将特征值矩阵进行标准化，使得特征值小于或等于1
X = preprocessing.scale(X)

# 将样本分为训练数据和测试数据
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# 创建训练模型
df = SVC()

# 开始训练
df.fit(X_train, y_train)

# 使用测试数据对训练结果进行评估
print df.score(X_test, y_test)

结果：

/Users/liudaoqiang/PycharmProjects/numpy/venv/bin/python /Users/liudaoqiang/Project/python_project/sklearn-day06/normalization.py
/Users/liudaoqiang/PycharmProjects/numpy/venv/lib/python2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
0.966666666667

Process finished with exit code 0

注意：

对特征值进行标准化后，训练评估打分为0.9以上，不进行特征值标准化，训练评估打分为0.5以下