我有以下这段代码可检索Twitter流数据并创建JSON文件。我想要得到的是停止例如1000条tweet之后的数据收集。如何设置密码?

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

# Other libs
import json

#Variables that contains the user credentials to access Twitter API
access_token = "XXX"
access_token_secret = "XXX"
consumer_key = "XXX"
consumer_secret = "XXX"

#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):

        try:
            tweet = json.loads(data)
            with open('your_data.json', 'a') as my_file:
                json.dump(tweet, my_file)


        except BaseException:
            print('Error')
            pass

    def on_error(self, status):
        print ("Error " + str(status))
        if status == 420:
            print("Rate Limited")
            return False


if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)


    stream.filter(track=['Euro2016', 'FRA', 'POR'], languages=['en'])

最佳答案

这是一个可能的解决方案:

class StdOutListener(StreamListener):

    tweet_number=0   # class variable

    def __init__(self,max_tweets):
        self.max_tweets=max_tweets # max number of tweets

    def on_data(self, data):
        self.tweet_number+=1
        try:
            tweet = json.loads(data)
            with open('your_data.json', 'a') as my_file:
                json.dump(tweet, my_file)
        except BaseException:
            print('Error')
            pass
        if self.tweet_number>=self.max_tweets:
            sys.exit('Limit of '+str(self.max_tweets)+' tweets reached.')

    def on_error(self, status):
        print ("Error " + str(status))
        if status == 420:
            print("Rate Limited")
            return False

l = StdOutListener(1000) # Here you can set your maximum number of tweets (1000 in this example)


定义了类变量tweet_number之后,我使用init()方法初始化了一个新的StdOutListener对象,该对象具有要收集的最大tweets数量。每次调用tweet_number方法时,on_data(data)都会增加1,导致程序在tweet_number>=max_tweets时终止

附言您需要导入sys才能使代码正常工作。

关于python - Twitter流式停止收集数据,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38297150/

10-13 07:46
查看更多