问题描述
我一直在尝试使用Weka的DBSCAN群集器来集群实例。根据我的理解,我应该使用 clusterInstance()
方法,但令我惊讶的是,当看一下该方法的代码时,它看起来像实现忽略参数:
I've been trying to use the DBSCAN clusterer from Weka to cluster instances. From what I understand I should be using the clusterInstance()
method for this, but to my surprise, when taking a look at the code of that method, it looks like the implementation ignores the parameter:
/**
* Classifies a given instance.
*
* @param instance The instance to be assigned to a cluster
* @return int The number of the assigned cluster as an integer
* @throws java.lang.Exception If instance could not be clustered
* successfully
*/
public int clusterInstance(Instance instance) throws Exception {
if (processed_InstanceID >= database.size()) processed_InstanceID = 0;
int cnum = (database.getDataObject(Integer.toString(processed_InstanceID++))).getClusterLabel();
if (cnum == DataObject.NOISE)
throw new Exception();
else
return cnum;
}
这似乎不对。那该怎么办?我应该使用不同的方法进行聚类吗?如果我想从中获取任何有用的信息,我是否必须按顺序在所有实例上按顺序运行此方法?
This doesn't seem right. How is that supposed to work? Is there a different method I should be using for clustering? Do I have to run this method sequentially on all instances, in some specific order, if I want to get any useful information out of it?
推荐答案
马克回答说,这显然是一个错误。只要您按照它们插入到群集器中的完全相同的顺序查询实例,就没关系;但它不适用于任何其他情况。
As Mark answered, this is obviously a bug. As long as you query about instances in the exact same order in which they were inserted into the clusterer it's okay; but it won't work in any other case.
一位同事通过编写自己版本的DBScan类解决了这个问题:基本相同(复制粘贴),除了她在实例和集群标签之间保持映射。可以通过迭代数据库
实例的内容来生成此映射。然后可以立即从该映射中检索实例的适当集群。
A co-worker solved this by writing her own version of the DBScan class: essentially identical (copy-pasted), except that she maintains a mapping between instances and cluster labels. This mapping can be produced by iterating over the contents of the database
instance. The appropriate cluster for an instance can then be immediately retrieved from that mapping.
编辑此方法也是更改抛出新异常的好机会
在这种情况下变得更明智,比如 return -1
。
Editing this method is also a good opportunity to change the throw new Exception
into something more sensible in this context, such as return -1
.
这篇关于如何使用Weka的DBSCAN对实例进行聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!