我运行代码计算皮尔逊相关系数,函数(粘贴在下面)顽固地返回0。
根据前面关于这个问题的建议(见下面的#1,#2),我确实确保函数能够执行浮点计算,但这没有帮助我希望能得到一些指导。
from __future__ import division
from math import sqrt
def sim_pearson(prefs,p1,p2):
# Get the list of mutually rated items
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# Find the number of elements
n=float(len(si))
# if they are no ratings in common, return 0
if n==0: return 0
# Add up all the preferences
sum1=float(sum([prefs[p1][it] for it in si]))
sum2=float(sum([prefs[p2][it] for it in si]))
# Sum up the squares
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# Sum up the products
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# Calculate Pearson score
num=pSum-(1.0*sum1*sum2/n)
den=sqrt((sum1Sq-1.0*pow(sum1,2)/n)*(sum2Sq-1.0*pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r
我的数据集:
# A dictionary of movie critics and their ratings of a small
# set of movies
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
其他类似问题:
链接1:
What is wrong with this python function from "Programming Collective Intelligence"?
链接2:
What is wrong with the pearson algorithm from “Programming Collective Intelligence”?
最佳答案
感谢大家在评论中的帮助,我发现了这个问题只是开玩笑。有很多问题。最后,我注意到for循环并没有折叠(在第6行),它需要折叠。在结束前的一个阶段,我疯狂地包围了所有的东西,对不起。不管怎么说,你想要彩车。在那之前,事实上他并没有为批评家们引用float
,而这正是他所需要的。另外,皮尔逊系数计算错误,以至于需要数学家来修正(我有数学学士学位)。现在,他为吉恩·西摩和丽莎·罗斯做的实验结果是正确的。无论如何,将其另存为keys()
,或其他:
from __future__ import division
from math import sqrt
def sim_pearson(prefs,p1,p2):
# Get the list of mutually rated items
si={}
for item in prefs[p1].keys():
for item in prefs[p2].keys():
if item in prefs[p2].keys():
si[item]=1
# Find the number of elements
n=float(len(si))
# if they are no ratings in common, return 0
if n==0:
print 'n=0'
return 0
# Add up all the preferences
sum1=float(sum([prefs[p1][it] for it in si.keys()]))
sum2=float(sum([prefs[p2][it] for it in si.keys()]))
print 'sum1=', sum1, 'sum2=', sum2
# Sum up the squares
sum1Sq=float(sum([pow(prefs[p1][it],2) for it in si.keys()]))
sum2Sq=float(sum([pow(prefs[p2][it],2) for it in si.keys()]))
print 'sum1s=', sum1Sq, 'sum2s=', sum2Sq
# Sum up the products
pSum=float(sum([prefs[p1][it]*prefs[p2][it] for it in si.keys()]))
# Calculate Pearson score
num=(pSum/n)-(1.0*sum1*sum2/pow(n,2))
den=sqrt(((sum1Sq/n)-float(pow(sum1,2))/float(pow(n,2)))*((sum2Sq/n)-float(pow(sum2,2))/float(pow(n,2))))
if den==0:
print 'den=0'
return 0
r=num/den
return r
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
1,1
然后,键入:
import pearson
pearson.sim_pearson(pearson.critics, pearson.critics.keys()[1], pearson.critics.keys()[2])
或者简单地说:
import pearson
pearson.sim_pearson(pearson.critics, 'Lisa Rose', 'Gene Seymour')
如果你在工作上有任何问题,请告诉我我留下了用来排除故障的
pearson.py
语句,这样您就可以看到我是如何解决的,但显然不需要它们。如果你在这本书中遇到了更多的问题,而你又无法解决,那么在SO的帮助下,也就是说,给我发电子邮件:raphael[在]postacle.com,我应该可以给你回复。我刚才也下载了,只是有点懒;)
关于python - 编程集体智慧中的Pearson算法仍然无法正常工作,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/13558529/