问题描述
我已经用python编写了实现DBSCAN集群算法的代码。
我的数据集包含14000个用户,每个用户由10个要素表示。
我无法确定将Min_samples和epsilon的值确切保留为输入
我应该如何确定呢?
相似性度量是欧几里德距离。(因此,它变得更加难以确定。)是否有指针?
DBSCAN通常很难估计其参数。
您是否考虑过OPTICS算法?在这种情况下,您只需要Min_samples即可,它对应于最小的群集大小。
否则,对于DBSCAN,我过去是通过反复试验来做到这一点的:尝试一些值看看会发生什么。遵循的一般规则是,如果您的数据集有噪声,则应该有一个更大的值,并且它还与维数(在这种情况下为10)相关。
I have written code in python to implement DBSCAN clustering algorithm.My dataset consists of 14k users with each user represented by 10 features.I am unable to decide what exactly to keep as the value of Min_samples and epsilon as inputHow should I decide that?Similarity measure is euclidean distance.(Hence it becomes even more tough to decide.) Any pointers?
DBSCAN is pretty often hard to estimate its parameters.
Did you think about the OPTICS algorithm? You only need in this case Min_samples which would correspond to the minimal cluster size.
Otherwise for DBSCAN I've done it in the past by trial and error : try some values and see what happens. A general rule to follow is that if your dataset is noisy, you should have a larger value, and it is also correlated with the number of dimensions (10 in this case).
这篇关于确定DBSCAN算法的输入值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!