问题描述
我有一个函数,它接受两个字符串,并给出余弦相似度值,显示两个文本之间的关系。
I have a function which takes two strings and gives out the cosine similarity value which shows the relationship between both texts.
如果我要比较75个文本其他,我需要做5,625个单一的比较,以使所有的文本相互比较。
If I want to compare 75 texts with each other, I need to make 5,625 single comparisons to have all texts compared with each other.
有没有办法减少这个比较的数量?例如稀疏矩阵或k-means?
Is there a way to reduce this number of comparisons? For example sparse matrices or k-means?
我不想谈论我的功能或比较文本的方法。
I don't want to talk about my function or about ways to compare texts. Just about reducing the number of comparisons.
推荐答案
Ben说,这是真的,为了更好地帮助你需要告诉我们什么是目标。
What Ben says it's true, to get better help you need to tell us what's the goal.
例如,一个可能的优化如果您想查找相似的字符串将字符串向量存储在空间数据结构(如四叉树)在那里你可以直接丢弃彼此离得太远的向量,避免许多比较。
For example, one possible optimization if you want to find similar strings is storing the string vectors in a spatial data structure such as a quadtree, where you can outright discard the vectors that are too far away from each other, avoiding many comparisons.
这篇关于加速文本比较(使用稀疏矩阵)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!