基于项目的协作过滤

基于项目的协作过滤

本文介绍了基于项目的协作过滤和基于内容的协作过滤之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对基于项目的推荐是什么感到困惑,如《 行动中的问题".书中有算法:

I am puzzled about what the item-based recommendation is, as described in the book "Mahout in Action". There is the algorithm in the book:

for every item i that u has no preference for yet
  for every item j that u has a preference for
    compute a similarity s between i and j
    add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average

如何计算项目之间的相似度?如果使用内容,这不是基于内容的推荐吗?

How can I calculate the similarity between items? If using the content, isn't it a content-based recommendation?

推荐答案

基于项目的协作过滤

最初基于项目的推荐是完全,具体取决于用户对项目的排名(例如,用户对三颗星的电影评分,或者用户喜欢"视频).当计算项目之间的相似度时,除了所有用户的评分历史之外,您不应该了解任何其他信息.因此,项目之间的相似性是根据评分而不是项目内容的元数据来计算的.

Item-Based Collaborative Filtering

The original Item-based recommendation is totally based on user-item ranking (e.g., a user rated a movie with 3 stars, or a user "likes" a video). When you compute the similarity between items, you are not supposed to know anything other than all users' history of ratings. So the similarity between items is computed based on the ratings instead of the meta data of item content.

让我给你举个例子.假设您只能访问以下某些评分数据:

Let me give you an example. Suppose you have only access to some rating data like below:

user 1 likes: movie, cooking
user 2 likes: movie, biking, hiking
user 3 likes: biking, cooking
user 4 likes: hiking

现在假设您要为用户4提出建议.

Suppose now you want to make recommendations for user 4.

首先,您为商品创建一个倒排索引,您将获得:

First you create an inverted index for items, you will get:

movie:     user 1, user 2
cooking:   user 1, user 3
biking:    user 2, user 3
hiking:    user 2, user 4

由于这是一个二进制等级(无论您是否喜欢),我们可以使用类似 Jaccard相似度的相似性度量来进行计算项目相似度.

Since this is a binary rating (like or not), we can use a similarity measure like Jaccard Similarity to compute item similarity.

                                 |user1|
similarity(movie, cooking) = --------------- = 1/3
                               |user1,2,3|

在分子中,user1是电影和烹饪两者唯一拥有的元素.在分母中,电影和烹饪的结合具有3个不同的用户(user1,2,3). |.|在这里表示集合的大小.因此我们知道电影和烹饪之间的相似度是我们的案例的1/3.您只需对所有可能的项目对(i,j)做同样的事情.

In the numerator, user1 is the only element that movie and cooking both has. In the denominator the union of movie and cooking has 3 distinct users (user1,2,3). |.| here denote the size of the set. So we know the similarity between movie and cooking is 1/3 in our case. You just do the same thing for all possible item pairs (i,j).

完成所有对的相似度计算后,例如,您需要为用户4提出建议.

After you are done with the similarity computation for all pairs, say, you need to make a recommendation for user 4.

  • 查看similarity(hiking, x)的相似性得分,其中x是您可能拥有的任何其他标签.
  • Look at the similarity score of similarity(hiking, x) where x is any other tags you might have.

如果需要为用户3提出建议,则可以汇总其列表中每个项目的相似性得分.例如

If you need to make a recommendation for user 3, you can aggregate the similarity score from each items in its list. For example,

score(movie)  = Similarity(biking, movie) + Similarity(cooking, movie)
score(hiking) = Similarity(biking, hiking) + Similarity(cooking, hiking)

基于内容的推荐

基于内容的观点是,我们必须知道用户和项目的内容.通常,您使用共享属性空间的内容来构造用户配置文件和项目配置文件.例如,对于电影,可以用其中的电影明星和流派来表示它(例如,使用二进制编码).对于用户个人资料,您可以根据用户(例如某些电影明星/流派等)执行相同的操作.然后可以使用余弦相似度来计算用户和项目的相似度.

Content-Based Recommendation

The point of content-based is that we have to know the content of both user and item. Usually you construct user-profile and item-profile using the content of shared attribute space. For example, for a movie, you represent it with the movie stars in it and the genres (using a binary coding for example). For user profile, you can do the same thing based on the users likes some movie stars/genres etc. Then the similarity of user and item can be computed using e.g., cosine similarity.

这是一个具体的例子:

假设这是我们的用户个人资料(使用二进制编码,0表示不喜欢,1表示喜欢),其中包含用户对5个电影明星和5个电影类型的偏好:

Suppose this is our user-profile (using binary encoding, 0 means not-like, 1 means like), which contains user's preference over 5 movie stars and 5 movie genres:

         Movie stars 0 - 4    Movie Genres
user 1:    0 0 0 1 1          1 1 1 0 0
user 2:    1 1 0 0 0          0 0 0 1 1
user 3:    0 0 0 1 1          1 1 1 1 0

假设这是我们的电影资料:

Suppose this is our movie-profile:

         Movie stars 0 - 4    Movie Genres
movie1:    0 0 0 0 1          1 1 0 0 0
movie2:    1 1 1 0 0          0 0 1 0 1
movie3:    0 0 1 0 1          1 0 1 0 1

要计算电影对用户的良好程度,我们使用余弦相似度:

To calculate how good a movie is to a user, we use cosine similarity:

                                 dot-product(user1, movie1)
similarity(user 1, movie1) = ---------------------------------
                                   ||user1|| x ||movie1||

                              0x0+0x0+0x0+1x0+1x1+1x1+1x1+1x0+0x0+0x0
                           = -----------------------------------------
                                         sqrt(5) x sqrt(3)

                           = 3 / (sqrt(5) x sqrt(3)) = 0.77460

类似地:

similarity(user 2, movie2) = 3 / (sqrt(4) x sqrt(5)) = 0.67082
similarity(user 3, movie3) = 3 / (sqrt(6) x sqrt(5)) = 0.54772

如果要为用户i提供一个建议,只需选择具有最高similarity(i, j)的电影j.

If you want to give one recommendation for user i, just pick movie j that has the highest similarity(i, j).

希望这会有所帮助.

这篇关于基于项目的协作过滤和基于内容的协作过滤之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 06:37