问题描述
我正在尝试从用户的时间轴中实时提取推文.然后,我想对这些推文进行一些分析.阅读文档后,看来此用例需要使用tweepy.Stream.我已完成以下操作:
I'm trying to pull tweets from a user's timeline in real-time. I then want to do some analysis on those tweets. Having read the docs it looks like I will need to use tweepy.Stream for this use case. I've done the following:
stream.filter(follow ='25073877')
但是Twitter的过滤器API声明以下内容:
But Twitter's filter API states the following:
- 用户创建的推文.
- 由用户转发的推文.
- 回复用户创建的任何推文.
- 由用户创建的任何Tweet的转发.
- 手动回复,无需按回复即可创建按钮(例如我同意@twitterapi").
似乎这将返回大量与我的用例无关的推文.我是否必须使用这种方法,然后按屏幕名称进行过滤才能仅获得真实用户的推文?这似乎根本不对.
It seems that this will return a huge volume of tweets that aren't relevant to my use case. Do I have to use this approach and then filter by screen name to get only tweets by the real user? This doesn't seem right at all.
替代方法似乎是api.user_timeline类,但这不是流API.因此,我是否使用此API并每秒点击一次?我似乎找不到合适的示例来说明如何最好地完成用例.
The alternative seems to be the api.user_timeline class but that isn't a streaming API. Do I therefore use this API and hit it every second? I can't seem to find suitable examples of how best to accomplish my use case.
推荐答案
是的,您需要按screen_name进行过滤,或者可以检查是否为转推.
Yes, you'll need to filter either by screen_name or maybe you can check if it's a retweet or not.
我不建议使用第二种方法,因为您将获得更多的推文,因为您必须过滤掉先前请求中已经收到的推文,如果您不这样做,则可能会达到API查询限制时间不正确.
I wouldn't recommend the second approach since you'll be getting an even bigger amount of tweets since you'll have to filter out the tweets you already received in previous requests plus you may hit the API querying limits if you don't time ti properly.
这是过滤器功能的签名:
That's the signature of the filter function:
def filter(self, follow=None, track=None, is_async=False, locations=None,
stall_warnings=False, languages=None, encoding='utf8', filter_level=None)
哪个映射到此 TwitterAPI请求.
这里是参数说明.
这篇关于如何使用Tweepy从用户的时间轴实时获取推文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!