如何使用ELKI进行索引

如何使用ELKI进行索引

本文介绍了如何使用ELKI进行索引-OPTICS群集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是ELKI的初学者,我一直在使用它来对.csv文件中的10K纬度点进行聚类.正确设置后,我想放大到1MM点.

I'm an ELKI beginner, and I've been using it to cluster around 10K lat-lon points from a .csv file. Once I get my settings correct, I'd like to scale up to 1MM points.

我正在使用带有LngLatDistanceFunction的OPTICSXi算法

I'm using the OPTICSXi algorithm with LngLatDistanceFunction

我一直在阅读有关通过STR批量加载启用R *-树索引"的内容,以查看性能方面的巨大改进.这些教程对我没有多大帮助.

I keep reading about "enabling R*-tree index with STR bulk loading" in order to see vast improvements in performance. The tutorials haven't helped me much.

有关如何实现此功能的任何提示?

Any tips on how I can implement this feature?

推荐答案

在二维数据上使用空间R *索引的建议参数为:

The suggested parameters for using a spatial R* index on 2 dimensional data are:

-db.index tree.spatial.rstarvariants.rstar.RStarTreeFactory
-pagefile.pagesize 512
-spatial.bulkstrategy SortTileRecursiveBulkSplit

对于高维数据,需要更大的页面尺寸. 512-1024字节的页面大小似乎是二维数据的最佳选择,但它的确也取决于您的数据.

For higher dimensional data, larger page sizes are necessary. A page size of 512-1024 bytes seems to be the sweet spot for 2 dimensional data, but it does depend on your data, too.

要离散化群集,可以使用Xi提取:

To discretize clusters, you can use the Xi extraction:

-algorithm clustering.optics.OPTICSXi -opticsxi.xi 0.005

要从OPTICS的索引加速中受益,请为您的应用选择尽可能小的epsilon.该参数以表示,其中所有地球模型都位于ELKI中.

To benefit from index acceleration with OPTICS, choose epsilon as small as possible for your application. The parameter is in meters with all the earth models in ELKI.

-opticsxi.algorithm OPTICSHeap
-algorithm.distancefunction geo.LatLngDistanceFunction
-optics.epsilon 2000.0 -optics.minpts 10

最多使用2公里的距离.

uses 2 km distances maximum.

确保区分latitude,longitudelongitude,latitude.这两个命令都被使用,并且您需要使用正确的距离函数:

Make sure to distinguish latitude,longitude and longitude,latitude. Both orders are used, and you need to use the right distance function:

geo.LatLngDistanceFunction
geo.LngLatDistanceFunction

这篇关于如何使用ELKI进行索引-OPTICS群集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-07 08:51