我定义了以下函数,该函数接受一个输入字符串,并将其与一个很大的字符串列表(都使用tfidf矢量化)进行比较:

def find_new_similar(tfidf_matrix2, index, tfidf_matrix, top_n = 5):
    cosine_similarities = linear_kernel(tfidf_matrix2[index:index+1], tfidf_matrix).flatten()
    related_docs_indices = [i for i in cosine_similarities.argsort()[::-1] if i != index]
    return [(i, cosine_similarities[i]) for i in related_docs_indices][0:top_n], index

当我调用此函数时,我的输出是:
    find_new_similar(tfidf_matrix2, 1, tfidf_matrix)
    Out[15]:
    ([(923576, 0.51192576542407131),
      (558563, 0.51192576542407131),
      (1554977, 0.51192576542407131),
      (1604772, 0.51192576542407131),
      (514529, 0.50251903670563314)],
     1)

其中每个元组的第一个元素(即923576558563)是一个大型术语文件的索引。我想使用这些索引并返回索引处的值。
I have tried:


for i, score in find_new_similar(tfidf_matrix2, 0, tfidf_matrix):
       print (score, corpus[i], i)
Traceback (most recent call last):

  File "<ipython-input-18-792db65f6fd0>", line 1, in <module>
    for i, score in find_new_similar(tfidf_matrix2, 0, tfidf_matrix):

ValueError: too many values to unpack (expected 2)

有人能帮忙吗?谢谢?

最佳答案

函数返回一个列表并index

return [(i, cosine_similarities[i]) for i in related_docs_indices][0:top_n], index

将代码更改为
for i, score in find_new_similar(tfidf_matrix2, 0, tfidf_matrix)[0]:
       print (score, corpus[i], i)

获取列表并对其进行迭代。

关于python - python ValueError:太多值无法解包(预期…),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50061137/

10-12 23:30