问题描述
如何找到矢量之间的余弦相似度?
How do I find the cosine similarity between vectors?
我需要找到相似度来衡量两行文本之间的相关性。
I need to find the similarity to measure the relatedness between two lines of text.
例如,我有两个句子,如:
For example, I have two sentences like:
用户界面机
...及其在tF-idf之后的各自向量,然后使用LSI进行标准化,例如
[1,0.5]
和 [0.5,1]
。
… and their respective vectors after tF-idf, followed by normalisation using LSI, for example[1,0.5]
and [0.5,1]
.
我如何衡量这些向量之间的熟悉程度?
How do I measure the smiliarity between these vectors?
推荐答案
public class CosineSimilarity extends AbstractSimilarity {
@Override
protected double computeSimilarity(Matrix sourceDoc, Matrix targetDoc) {
double dotProduct = sourceDoc.arrayTimes(targetDoc).norm1();
double eucledianDist = sourceDoc.normF() * targetDoc.normF();
return dotProduct / eucledianDist;
}
}
我最近为我的信息做了一些tf-idf的东西大学检索单位。
我使用这个Cosine Similarity方法,该方法使用。
I did some tf-idf stuff recently for my Information Retrieval unit at University.I used this Cosine Similarity method which uses Jama: Java Matrix Package.
有关完整的源代码,请参阅,非常好的资源,涵盖了很多不同的相似性度量。
For the full source code see IR Math with Java : Similarity Measures, really good resource that covers a good few different similarity measurements.
这篇关于如何计算两个向量的余弦相似度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!