问题描述
ghtorrent-bq
数据非常适合GitHub的快照,但是,它不清楚它何时更新以及如何获得更新的数据(与可以提供帮助),但同时您可以合并两个数据集(查找GHTorrent快照数据,然后添加最新的星星GitHub Archive):
#standardSQL
SELECT COUNT(DISTINCT登录)c
FROM(
SELECT登录
FROM(
SELECT login
FROM`ghtorrent-bq。 ght_2017_01_19.watchers` a
JOIN`ghtorrent-bq.ght_2017_01_19.projects` b
ON a.repo_id = b.id
JOIN`ghtorrent-bq.ght_2017_01_19.users` c
ON a.user_id = c.id
WHERE url ='https://api.github.com/repos/angular/angular'
)
UNION ALL(
SELECT actor.login
FROM`githubarchive.month.2017 *`
WHERE repo.name ='angular / angular'
AND type =WatchEvent
)
)
The ghtorrent-bq
data is great to have snapshot of GitHub, however, it is not clear when it is updated and how I could get more up to date data
(related to https://stackoverflow.com/a/42930963/132438)
GHTorrent only provides a periodical snapshot of their data on BigQuery, while GitHub Archive updates daily (or even hourly - let me check that).
It would be great to have a more frequent snapshot of GHTorrent (maybe https://twitter.com/gousiosg can help), but in the meantime you can merge both datasets (look for the GHTorrent snapshot data, and then add the latest stars from GitHub Archive):
#standardSQL
SELECT COUNT(DISTINCT login) c
FROM (
SELECT login
FROM (
SELECT login
FROM `ghtorrent-bq.ght_2017_01_19.watchers` a
JOIN `ghtorrent-bq.ght_2017_01_19.projects` b
ON a.repo_id=b.id
JOIN `ghtorrent-bq.ght_2017_01_19.users` c
ON a.user_id=c.id
WHERE url = 'https://api.github.com/repos/angular/angular'
)
UNION ALL (
SELECT actor.login
FROM `githubarchive.month.2017*`
WHERE repo.name='angular/angular'
AND type = "WatchEvent"
)
)
这篇关于BigQuery:何时刷新GHTorrent以及如何获取最新信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!