问题描述
我希望在几个星期内从Twitter收集数据。
I want to collect data from twitter over a period of several weeks.
为此,我使用RStudio服务器和crontab自动运行以下几个脚本:
To do so, I use RStudio Server and crontab to automatically run several scripts like the following:
require(ROAuth)
require(twitteR)
require(plyr)
load("twitter_authentication.Rdata")
registerTwitterOAuth(cred)
searchResults <- searchTwitter("#hashtag", n=15000, since = as.character(Sys.Date()-1), until = as.character(Sys.Date()))
head(searchResults)
tweetsDf = ldply(searchResults, function(t) t$toDataFrame())
write.csv(tweetsDf, file = paste("tweets_test_", Sys.Date() - 1, ".csv", sep = ""))
在某些日子里,每个主题标签我只会有几条tweet(最多100条),因此脚本运行平稳。但是,在其他日子里,某个主题标签将有成千上万条推文(当然,我使用的不是主题标签一词,而是我学习所需的术语)。
On some days, I will only have a few tweets (up to 100) per hashtag and so the script runs smoothly. However, on other days there will be thousands of tweets for a certain hashtag (of course I am not using the term "hashtag" but the term I need for my study).
我可以将 retryOnRateLimit = 10
添加到 serchTwitter
。但是,当我每天搜索多个主题标签时,应该如何在crontab中对这些查询进行计时?
I can add retryOnRateLimit=10
to serchTwitter
. But when I search for multiple hashtags every day, how should I time these queries in crontab?
为了组织这些查询,我需要知道在15分钟的时间间隔内运行一次脚本能够收集多少条推文!有人知道答案吗? (当然,根据Twitter API的速率限制,我可以
In order to organize these queries, I need to know how many tweets I am able to collect by running the script once within the 15 minute time interval! Does anybody know the answer? (of course, according to the Twitter API rate limits, I can do
但这是多少条推文?)
推荐答案
而不是每隔几分钟执行一次搜索,您应该使用
Rather than performing a search every few minutes, you should use the Streaming API
这将为您提供流经Twitter的所有数据的实时提要。您可以为搜索词设置过滤器。
This will deliver you a real-time feed of all the data flowing through Twitter. You can set a filter for your search term.
这样就没有速率限制,您只需建立一个持久连接,Twitter就会提供所有推文的样本。匹配您的搜索字词。
There's no "rate limit" as such - you just make a single persistent connection and Twitter deliver a sample of all the tweets matching your search term.
这篇关于twitter API速率限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!