问题描述
当我计算一组爬域名的网页排名,使用0.85阻尼因子。正如在许多页提到的行列论文,页面的PR值的总和应收敛于1。但不管有多少反复做,这似乎收敛于0.90xxx。如果我低衰减系数为0.5,我靠拢1显而易见的。
When I'm calculating page ranks of a set of crawled domains, using a dampening factor of 0.85. As mentioned in many page ranks papers, the sum of pageranks should converge to 1. But regardless of how many iterations I do, it seems to converge at 0.90xxx. If I lower dampening factor to 0.5, I move closer to 1 obviously.
是不是坏了网页排名总和收敛于0.90,什么会,这通常牵连?
Is it bad that the page ranks sum converge at 0.90, and what would this generally implicate?
推荐答案
这成为了算法:
// data structures
private HashMap<String, Double> pageRanks;
private HashMap<String, Double> oldRanks;
private HashMap<String, Integer> numberOutlinks;
private HashMap<String, HashMap<String, Integer>> inlinks;
private HashSet<String> domainsWithNoOutlinks;
private double N;
// data parsing occluded
public void startAlgorithm() {
int maxIterations = 20;
int itr = 0;
double d = 0.85;
double dp = 0;
double dpp = (1 - d) / N;
// initialize pagerank
for (String s : oldRanks.keySet()) {
oldRanks.put(s, 1.0 / N);
}
System.out.println("Starting page rank iterations..");
while (maxIterations >= itr) {
System.out.println("Iteration: " + itr);
dp = 0;
// teleport probability
for (String domain : domainsWithNoOutlinks) {
dp = dp + d * oldRanks.get(domain) / N;
}
for (String domain : oldRanks.keySet()) {
pageRanks.put(domain, dp + dpp);
for (String inlink : inlinks.get(domain).keySet()) { // for every inlink of domain
pageRanks.put(domain, pageRanks.get(domain) + inlinks.get(domain).get(inlink) * d * oldRanks.get(inlink) / numberOutlinks.get(inlink));
}
}
// update pageranks with new values
for (String domain : pageRanks.keySet()) {
oldRanks.put(domain, pageRanks.get(domain));
}
itr++;
}
}
这条线是重要的:
Where this line is the important one:
pageRanks.put(domain, pageRanks.get(domain) + inlinks.get(domain).get(inlink) * d * oldRanks.get(inlink) / numberOutlinks.get(inlink));
inlinks.get(域)获得(内链接)返回多少的内链接如/引用当前域,我们除以多少反向链接,目前域名拥有。而inlinks.get(域)获得(内链接)是我错过了我的算法因此为什么总和不收敛于1。
inlinks.get(domain).get(inlink) returns how much an inlink "like/referenced" the current domain, and we divide that by how many inlinks that current domain have. And "inlinks.get(domain).get(inlink)" is what I missed in my algorithm hence why the sum didn't converge at 1.
了解更多: http://www.ccs.northeastern.edu /home/daikeshi/notes/PageRank.pdf
这篇关于我的网页排名之和为0.9收敛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!