在ELKI上使用地理距离功能

本文介绍了在ELKI上使用地理距离功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用ELKI来挖掘一些地理空间数据(纬长对)，并且我非常关注使用正确的数据类型和算法.在算法的参数化程序上，我尝试通过以下方式通过地理函数(LngLatDistanceFunction，因为我使用的是x，y数据)来更改默认距离函数:

I am using ELKI to mine some geospatial data (lat,long pairs) and I am quite concerned on using the right data types and algorithms. On the parameterizer of my algorithm, I tried to change the default distance function by a geo function (LngLatDistanceFunction, as I am using x,y data) as bellow:

params.addParameter (DISTANCE_FUNCTION_ID,  geo.LngLatDistanceFunction.class);

但是结果却非常令人惊讶:它创建了重复点的簇，例如下面的示例:

However the results are quite surprising: it creates clusters of a repeated point, such as the example bellow:

(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN) ，(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN)，(2.17199922，41.38190043，NaN)]

(2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN)]

这是此示例的图片.

是否使用非地理距离距离(例如，曼哈顿):

Whether I used a non-geo distance (for instance manhattan):

params.addParameter (DISTANCE_FUNCTION_ID,  geo.minkowski.ManhattanDistanceFunction.class);

，t 输出结果更加合理

我想知道我的代码是否有问题.

I wonder if there is something wrong with my code.

我直接在数据库上运行算法，如下所示:

I am running the algorithm directly on the db, like this:

         Clustering<Model> result = dbscan.run(db);

然后在构造凸包的同时循环遍历结果:

And then iterating over the results in a loop, while I construct the convex hulls:

   for (de.lmu.ifi.dbs.elki.data.Cluster<?> cl : result.getAllClusters()) {
               if (!cl.isNoise()){
                     Coordinate[] ptList=new Coordinate[cl.size()];
                        int ct=0;

                        for (DBIDIter iter = cl.getIDs().iter();
                                iter.valid(); iter.advance()) {
                                ptList[ct]=dataMap.get(DBIDUtil.toString(iter));
                                ++ct;
                        }

                        GeoPolygon poly=getBoundaryFromCoordinates(ptList);
                        if (poly.getCoordinates().getGeometryType()==
                        "Polygon"){
                            out.write(poly.coordinates.toText()+"\n");
                        }
               }
            }

为了将每个ID映射到一个点，我使用一个哈希图，该哈希图是在读取数据库时初始化的.之所以添加此代码，是因为我怀疑我正在对算法进行传递或从中读取结构时可能做错了什么.在此先感谢您提出的任何可帮助我解决此问题的意见.我发现ELKI是一个非常高效和完善的库，但是我很难找到示例来说明简单的情况，例如我的.

To map each ID to a point, I use a hashmap, that I initialized when reading the database.The reason why I am adding this code, is because I suspect that I may doing something wrong regarding the structures that I am passing/reading to/from the algorithm.I thank you in advance for any comments that could help me to solve this. I find ELKI a very efficient and sophisticated library, but I have trouble to find examples that illustrate simple cases, like mine.

在ELKI上使用地理距离功能

问题描述

推荐答案