我试图将每个打开的推文都放在一个标签中,但我的代码不会超过 299 条推文。

我还尝试从特定时间线获取推文,例如仅在 2015 年 5 月和 2016 年 7 月的推文。有没有办法在主进程中执行此操作,还是应该为其编写一些代码?

这是我的代码:

# if this is the first time, creates a new array which
# will store max id of the tweets for each keyword
if not os.path.isfile("max_ids.npy"):
    max_ids = np.empty(len(keywords))
    # every value is initialized as -1 in order to start from the beginning the first time program run
    max_ids.fill(-1)
else:
    max_ids = np.load("max_ids.npy")  # loads the previous max ids

# if there is any new keywords added, extends the max_ids array in order to correspond every keyword
if len(keywords) > len(max_ids):
    new_indexes = np.empty(len(keywords) - len(max_ids))
    new_indexes.fill(-1)
    max_ids = np.append(arr=max_ids, values=new_indexes)

count = 0
for i in range(len(keywords)):
    since_date="2015-01-01"
    sinceId = None
    tweetCount = 0
    maxTweets = 5000000000000000000000  # maximum tweets to find per keyword
    tweetsPerQry = 100
    searchQuery = "#{0}".format(keywords[i])
    while tweetCount < maxTweets:
        if max_ids[i] < 0:
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry)
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            since_id=sinceId)
        else:
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_ids - 1))
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_ids - 1),
                                            since_id=sinceId)
        if not new_tweets:
            print("Keyword: {0}      No more tweets found".format(searchQuery))
            break
        for tweet in new_tweets:
            count += 1
            print(count)

            file_write.write(
                       .
                       .
                       .
                         )

            item = {
                .
                .
                .
                .
                .
            }

            # instead of using mongo's id for _id, using tweet's id
            raw_data = tweet._json
            raw_data["_id"] = tweet.id
            raw_data.pop("id", None)

            try:
                db["Tweets"].insert_one(item)
            except pymongo.errors.DuplicateKeyError as e:
                print("Already exists in 'Tweets' collection.")
            try:
                db["RawTweets"].insert_one(raw_data)
            except pymongo.errors.DuplicateKeyError as e:
                print("Already exists in 'RawTweets' collection.")

        tweetCount += len(new_tweets)
        print("Downloaded {0} tweets".format(tweetCount))
        max_ids[i] = new_tweets[-1].id

np.save(arr=max_ids, file="max_ids.npy")  # saving in order to continue mining from where left next time program run

最佳答案

对不起,我无法在评论中回答,太长了。 :)

当然:) 检查这个例子:
高级搜索#data 关键字 2015 年 5 月 - 2016 年 7 月
得到这个网址:https://twitter.com/search?l=&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd

session = requests.session()
keyword = 'data'
date1 = '2015-05-01'
date2 = 2016-07-31
session.get('https://twitter.com/search?l=&q=%23+keyword+%20since%3A+date1+%20until%3A+date2&src=typd', streaming = True)

现在我们拥有所有请求的推文,
可能您可能会遇到“分页”问题
分页网址 ->

https://twitter.com/i/search/timeline?vertical=news&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd&include_available_features=1&include_entities=1&max_position=TWEET-759522481271078912-759538448860581892-BD1UO2FFu9QAAAAAAAAETAAAAAcAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&reset_error_state=false

可能你可以输入一个随机的推文 id,或者你可以先解析,或者从 twitter 请求一些数据。可以办到。

使用 Chrome 的网络选项卡查找所有请求的信息:)

关于python - 如何在带有 tweepy 的主题标签中获取所有推文?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/44948628/

10-13 04:37