问题描述
我正在尝试标准化一些数据,以便能够对其应用PCA。我正在使用sklearn.preprocessing.StandardScaler。我很难理解在参数 with_mean和 with_std中使用 True或 False之间的区别。以下是命令的说明:
说明:
如果将 with_mean
和 with_std
设置为 False
,则将平均值μ
设置为 0
并 std
设为1,假定列/特征来自正态高斯分布(均值为0和1 std)。
如果将 with_mean
和 with_std
设置为 True
,那么您实际上将使用数据的真实μ
和σ
。这是最常见的方法。
I am trying to standardize some data to be able to apply PCA to it. I am using sklearn.preprocessing.StandardScaler. I am having trouble to understand the difference between using "True" or "False" in the parameters "with_mean" and "with_std". Here is the description of the command:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
Can someone give a more extended explanation?
Thank you very much!
I have provided more details in this post https://stackoverflow.com/a/50879522/5025009, but let me just explain this here as well.
The standardation of the data (each column/feature/variable indivivually) involves the following equations:
Explanation:
If you set with_mean
and with_std
to False
, then the mean μ
is set to 0
and the std
to 1, assuming that the columns/features are coming from the normal gaussian distribution (which has 0 mean and 1 std).
If you set with_mean
and with_std
to True
, then you will actually use the true μ
and σ
of your data. This is the most common approach.
这篇关于sklearn StandardScaler与“ with_std = False或True”之间的差异和“ with_mean = False或True”;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!