问题描述
如何使用DBSCAN算法对相似的网址进行分组。我见过很多数据集,但是url都没有,我想采用类似类型的url并将其分组。在这里,我无法得知距离(eps),而最小点可以是要分组的URL数量。
how to group similar url using the DBSCAN algorithm. I have seen many datasets but none were on url , I want to take similar type of urls and group it together. Here i am not able to know distance (eps) and minpoints can be the number of urls to be grouped.
推荐答案
DBSCAN需要距离函数和检测相似物体的阈值。
DBSCAN needs a distance function and a threshold for detecting similar objects.
因此,首先,您需要定义适当的距离函数和阈值,然后我们可以为您提供DBSCAN的帮助(但是您应该能够找到可以扩展到任意距离函数的DBSCAN实现)。
So go ahead, first you need to define an appropiate distance function and a threshold, then we can help you with DBSCAN (but you should be able to find DBSCAN implementations that can be extened to arbitrary distance functions).
关键的挑战是距离,这取决于您,因为我们不知道您想得到什么。这是非常主观的,我们只是不知道您想要什么或需要什么。
The key challenge is the distance, and this is up to you, because we do not know what you want to get out. This is very subjective, and we just don't know what you want or need.
这篇关于如何将DBSCAN算法应用于相似网址的分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!