问题描述
我正在尝试寻找可用于我的工作的minhash开源实现.
I am trying to look for a minhash open source implementation which I can leverage for my work.
我需要的功能非常简单,给定一组输入,实现应返回其minhash.
The functionality I need is very simple, given a set as input, the implementation should return its minhash.
最好使用python或C实现,以防万一我需要对其进行破解以供我使用.
A python or C implementation would be preferred, just in case I need to hack it to work for me.
任何指针都会有很大帮助.
Any pointers would be of great help.
致谢.
推荐答案
您应该按顺序查看以下开放源代码库.所有这些都在Python中,并展示了如何使用LSH/MinHash计算文档相似度:
You should have a look at the following open source libraries, in order. All of them are in Python, and show how you can calculate document similarity using LSH/MinHash:
lsh
LSHHDC:基于位置敏感的哈希的高维聚类
MinHash
lsh
LSHHDC : Locality-Sensitive Hashing based High Dimensional Clustering
MinHash
这篇关于你能建议一个好的minhash实现吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!