本文介绍了如何计算两个字符串向量之间的余弦相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有2个维数为6的向量,并且我想要一个0到1之间的数字.
I have 2 vectors of dimensions 6 and I would like to have a number between 0 and 1.
a=c("HDa","2Pb","2","BxU","BuQ","Bve")
b=c("HCK","2Pb","2","09","F","G")
任何人都可以解释我该怎么做吗?
Can anyone explain what I should do?
推荐答案
使用lsa
程序包和该程序包的手册
using the lsa
package and the manual for this package
# create some files
library('lsa')
td = tempfile()
dir.create(td)
write( c("HDa","2Pb","2","BxU","BuQ","Bve"), file=paste(td, "D1", sep="/"))
write( c("HCK","2Pb","2","09","F","G"), file=paste(td, "D2", sep="/"))
# read files into a document-term matrix
myMatrix = textmatrix(td, minWordLength=1)
显示mymatrix
对象的状态
myMatrix
#myMatrix
# docs
# terms D1 D2
# 2 1 1
# 2pb 1 1
# buq 1 0
# bve 1 0
# bxu 1 0
# hda 1 0
# 09 0 1
# f 0 1
# g 0 1
# hck 0 1
# Calculate cosine similarity
res <- lsa::cosine(myMatrix[,1], myMatrix[,2])
res
#0.3333
这篇关于如何计算两个字符串向量之间的余弦相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!