本文介绍了item-to-item协同过滤,如何管理相似度矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个推荐引擎,我现在面临的一个问题是项目的相似度矩阵很大.

I am working on a recommendation engine and one problem I am facing right now is the similarity matrix of items are huge.

我计算了 20,000 个项目的相似度矩阵,并将它们存储为一个二进制文件,该文件大小接近 1 GB.我觉得它太大了.

I calculated similarity matrix of 20,000 items and stored them a a binary file which tuned out to be nearly 1 GB. I think it is too big.

如果您有这么多项目,处理相似度矩阵的最佳方法是什么?

what is the best way do deal with similarity matrix if you have that many items?

任何建议!

推荐答案

实际上相似度矩阵是关于对象与另一个对象的相似程度.每行由对象(行 id)的邻居组成,但您不需要存储所有邻居,例如仅存储 20 个邻居.使用 lil_matrix:from scipy.sparse import lil_matrix

In fact similarity matrix is about how object similar to another objects. Each row consist of neighbors of object(row id), but you don't need to store all of neighbors, store for example only 20 neighbors. Use lil_matrix:from scipy.sparse import lil_matrix

这篇关于item-to-item协同过滤,如何管理相似度矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-27 14:19