Wikidata结果按类似于PageRank的顺序排序

Wikidata结果按类似于PageRank的顺序排序

本文介绍了Wikidata结果按类似于PageRank的顺序排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Wikidata中( Wikidata SPARQL端点),有一种方法可以对SPARQL查询结果进行排序(例如PageRank?

In Wikidata (Wikidata SPARQL endpoint), is there a way to order the SPARQL query results with something like a PageRank?

SELECT DISTINCT ?entity ?entityLabel WHERE {
    ?entity wdt:P31 wd:Q5.
    SERVICE wikibase:label {
     bd:serviceParam wikibase:language "en" .
    }
} LIMIT 100 OFFSET 0

我们是否可以指定一个字段来对结果进行排序,并且该字段表示顶部的实体比下一个更显着/重要/可识别?依此类推?

Can we specify a field to order the results by and that field expresses that the entity at the top is more notable/important/recognizable that the following one and so on?

推荐答案

PageRank对于Wikidata似乎没有多大意义.显然,大类和大集团将是领导者.

It seems that PageRank does not make much sense in relation to Wikidata. Obviously, large classes and large aggregates will be leaders.

此外,与Web链接不同,RDF谓词在两侧都是可导航"的;这只是设计问题,哪个URI是主题,哪个URI是对象.

Also, unlike web links, RDF predicates are "navigable" from both sides; this is just a matter of design, which URI is a subject and which URI is an object.

但是,安德里亚斯·塔勒哈默(Andreas Thalhammer)继续他的工作.十大Wikidata实体是:

However, Andreas Thalhammer continues his work. Top 10 Wikidata entities are:

Q729    animal      24996.77
Q30     USA         24772.45
Q1360   Arthropoda  16930.883
Q1390   insects     16531.822
Q35409  family      14403.091
Q756    plant       14019.927
Q142    France      13723.484
Q34740  genus       13718.484
Q16     Canada      12321.178
Q159    Russia      11707.16

不幸的是,Wikidata页面排名没有发布在(相同的)端点上,无法使用SPARQL查询它们.

Unfortunately, Wikidata pageranks are not published on the (same) endpoint, one can't query them using SPARQL.

幸运的是,一个人可以弄清楚自己的等级.可能的选项是:

Fortunately, one can figure out some kind of a rank oneself. Possible options are:

  1. 即将发表的声明数(已预先计算);
  2. 附加链接数(已预先计算);
  3. 传入语句的数量(在下面的示例中,仅真实语句被计数).
  1. Number of outcoming statements (precalculated);
  2. Number of sitelinks (precalculated);
  3. Number of incoming statements (in the example below, only truthy statements are counted).

查询示例:

SELECT ?item ?itemLabel ?outcoming ?sitelinks ?incoming {
    ?item wdt:P463 wd:Q458 .
    ?item wikibase:statements ?outcoming .
    ?item wikibase:sitelinks ?sitelinks .
       {
       SELECT (count(?s) AS ?incoming) ?item WHERE {
           ?item wdt:P463 wd:Q458 .
           ?s ?p ?item .
           [] wikibase:directClaim ?p
      } GROUP BY ?item
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }.
} ORDER BY DESC (?incoming)

尝试一下!

截至2017年10月,所有这些指标或多或少都相关.

As of October 2017, all these metrics are more or less correlated.

以下是这些措施对欧盟成员国的相关系数.

Here below are correlation coefficients of these measures for the EU members.

Pearson
-------
          outcoming sitelinks incoming pagerank
outcoming    1.0000    0.6907   0.7416   0.8652
sitelinks    0.6907    1.0000   0.4314   0.5717
incoming     0.7416    0.4314   1.0000   0.8978
pagerank     0.8652    0.5717   0.8978   1.0000


Spearman
--------
          outcoming sitelinks incoming pagerank
outcoming    1.0000    0.6869   0.7619   0.8736
sitelinks    0.6869    1.0000   0.7680   0.8342
incoming     0.7619    0.7680   1.0000   0.8872
pagerank     0.8736    0.8342   0.8872   1.0000


Kendall
-------
          outcoming sitelinks incoming pagerank
outcoming    1.0000    0.4914   0.5661   0.7143
sitelinks    0.4914    1.0000   0.5764   0.6454
incoming     0.5661    0.5764   1.0000   0.7249
pagerank     0.7143    0.6454   0.7249   1.0000


另请参阅:


See also:

  • https://phabricator.wikimedia.org/T143424
  • https://wiki.blazegraph.com/wiki/index.php/RDF_GAS_API#PageRank
  • https://phabricator.wikimedia.org/T162279

这篇关于Wikidata结果按类似于PageRank的顺序排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 03:31