问题描述
我正在尝试标准化一些数据,以便能够对其应用PCA.我正在使用sklearn.preprocessing.StandardScaler.我很难理解在参数with_mean
和with_std
中使用True
或False
之间的区别(文档).
I am trying to standardize some data to be able to apply PCA to it. I am using sklearn.preprocessing.StandardScaler. I am having trouble to understand the difference between using True
or False
in the parameters with_mean
and with_std
(documentation).
有人可以提供更详细的解释吗?
Can someone give a more extended explanation?
推荐答案
我在此线程中提供了更多详细信息,但我也要在这里解释一下.
I have provided more details in this thread, but let me just explain this here as well.
数据标准化(每个列/功能/每个变量)涉及以下方程式:
The standardation of the data (each column/feature/variable indivivually) involves the following equations:
说明:
如果将with_mean
和with_std
设置为False
,则将平均值μ
设置为0
,将std
设置为1,假定列/特征来自正态高斯.分布(平均值为0,标准差为1).
If you set with_mean
and with_std
to False
, then the mean μ
is set to 0
and the std
to 1, assuming that the columns/features are coming from the normal gaussian distribution (which has 0 mean and 1 std).
如果将with_mean
和with_std
设置为True
,则实际上将使用数据的真实μ
和σ
.这是最常见的方法.
If you set with_mean
and with_std
to True
, then you will actually use the true μ
and σ
of your data. This is the most common approach.
这篇关于"with_std = False或True"之间的StandardScaler差异和"with_mean = False或True";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!