我正在使用 kmodes python 库。可以解释一下这些参数的含义吗?
关联:
https://github.com/nicodv/kmodes#huang97
km = kmodes.KModes(n_clusters=4, init='Huang', n_init=5, verbose=1)
我知道 n_clusters 是将数据分组的簇数,但其他参数是什么?
最佳答案
从 source code :
Parameters
-----------
n_clusters : int, optional, default: 8
The number of clusters to form as well as the number of
centroids to generate.
max_iter : int, default: 300
Maximum number of iterations of the k-modes algorithm for a
single run.
cat_dissim : func, default: matching_dissim
Dissimilarity function used by the algorithm for categorical variables.
Defaults to the matching dissimilarity function.
init : {'Huang', 'Cao', 'random' or an ndarray}, default: 'Cao'
Method for initialization:
'Huang': Method in Huang [1997, 1998]
'Cao': Method in Cao et al. [2009]
'random': choose 'n_clusters' observations (rows) at random from
data for the initial centroids.
If an ndarray is passed, it should be of shape (n_clusters, n_features)
and gives the initial centroids.
n_init : int, default: 10
Number of time the k-modes algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of cost.
verbose : int, optional
Verbosity mode.
所以
init
只是用于初始化的方法,而 n_init
是算法将运行的次数,从这些独立运行中选择最佳输出。verbose
只是规定有多少输出被传递到标准输出(即告诉你算法处于什么阶段等)。