具体的结果聚焦，而使用Python和美丽的汤4刮微博？

本文介绍了具体的结果聚焦，而使用Python和美丽的汤4刮微博？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个跟进我的文章Using Python的刮嵌套的div，横跨在Twitter的？。

This is a follow up to my post Using Python to Scrape Nested Divs and Spans in Twitter?.

我不使用Twitter的API，因为它不看的鸣叫
这包括hashtag远。完成code和产量均低于例子了。

I'm not using the Twitter API because it doesn't look at the tweets byhashtag this far back. Complete code and output are below after examples.

我想从每个鸣叫刮具体数据。 名称和处理检索的正是我要找的，但我无法缩小其余元素

I want to scrape specific data from each tweet. name and handle are retrieving exactly what I'm looking for, but I'm having trouble narrowing down the rest of the elements.

作为一个例子：

 link = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
 url = link[0]

获取此：

 <a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/Mikepeeljourno/status/648787700980408320" title="2:13 AM - 29 Sep 2015">
 <span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1443518016" data-time-ms="1443518016000">29 Sep 2015</span></a>

有关的URL，我只需要在的href 从第一行的值。

For url, I only need the href value from the first line.

同样，锐推和收藏夹命令返回的HTML大块，当我真正需要的是数值所显示为每一个值

Similarly, the retweets and favorites commands return large chunks of html, when all I really need is the numerical value that is displayed for each one.

我怎样才能缩小的结果，所需数据的URL，retweetcount和favcount输出？

How can I narrow down the results to the required data for the url, retweetcount and favcount outputs?

我计划通过所有的鸣叫有这个周期一旦我得到它的工作，在对你的建议的影响情况。

I am planning to have this cycle through all the tweets once I get it working, in case that has an influence on your suggestions.

完成code：

 from bs4 import BeautifulSoup
 import requests
 import sys

 url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
 r = requests.get(url, headers=headers)
 data = r.text.encode('utf-8')
 soup = BeautifulSoup(data, "html.parser")

 name = soup('strong', {'class': 'fullname js-action-profile-name show-popup-with-id'})
 username = name[0].contents[0]

 handle = soup('span', {'class': 'username js-action-profile-name'})
 userhandle = handle[0].contents[1].contents[0]

 link = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
 url = link[0]

 messagetext = soup('p', {'class': 'TweetTextSize  js-tweet-text tweet-text'})
 message = messagetext[0]

 retweets = soup('button', {'class': 'ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet'})
 retweetcount = retweets[0]

 favorites = soup('button', {'class': 'ProfileTweet-actionButtonUndo u-linkClean js-actionButton js-actionFavorite'})
 favcount = favorites[0]

 print (username, "\n", "@", userhandle, "\n", "\n", url, "\n", "\n", message, "\n", "\n", retweetcount, "\n", "\n", favcount) #extra linebreaks for ease of reading

完整输出：

Michael Peel

@Mikepeeljourno

<a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/Mikepeeljourno/status/648787700980408320" title="2:13 AM - 29 Sep 2015"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1443518016" data-time-ms="1443518016000">29 Sep 2015</span></a>

<p class="TweetTextSize js-tweet-text tweet-text" data-aria-label-part="0" lang="en"><a class="twitter-hashtag pretty-link js-nav" data-query-source="hashtag_click" dir="ltr" href="/hashtag/FT?src=hash"><s>#</s><b>FT</b></a> Case closed: <a class="twitter-hashtag pretty-link js-nav" data-query-source="hashtag_click" dir="ltr" href="/hashtag/Thailand?src=hash"><s>#</s><b>Thailand</b></a> police chief proclaims <a class="twitter-hashtag pretty-link js-nav" data-query-source="hashtag_click" dir="ltr" href="/hashtag/Bangkokbombing?src=hash"><s>#</s><b><strong>Bangkokbombing</strong></b></a> solved ahead of his retirement this week -even as questions over case grow</p>

<button class="ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet" data-modal="ProfileTweet-retweet" type="button">
<div class="IconContainer js-tooltip" title="Undo retweet">
<span class="Icon Icon--retweet"></span>
<span class="u-hiddenVisually">Retweeted</span>
</div>
<div class="IconTextContainer">
<span class="ProfileTweet-actionCount">
<span aria-hidden="true" class="ProfileTweet-actionCountForPresentation">4</span>
</span>
</div>
</button>

<button class="ProfileTweet-actionButtonUndo u-linkClean js-actionButton js-actionFavorite" type="button">
<div class="IconContainer js-tooltip" title="Undo like">
<div class="HeartAnimationContainer">
<div class="HeartAnimation"></div>
</div>
<span class="u-hiddenVisually">Liked</span>
</div>
<div class="IconTextContainer">
<span class="ProfileTweet-actionCount">
<span aria-hidden="true" class="ProfileTweet-actionCountForPresentation">2</span>
</span>
</div>
</button>

有人建议BeautifulSoup - 提取可能有这个问题的答案有属性值。不过，我认为，问题和答案，没有足够的上下文或解释是在更复杂的情况很有帮助。美丽的汤文档的相关部分的链接是有帮助的，虽然，的

It was suggested that BeautifulSoup - extracting attribute values may have an answer to this question there. However, I think the question and its answers do not have sufficient context or explanation to be helpful in more complex situations. The link to the relevant part of the Beautiful Soup Documentation is helpful though, http://www.crummy.com/software/BeautifulSoup/documentation.html#The%20attributes%20of%20Tags

down

具体的结果聚焦，而使用Python和美丽的汤4刮微博？

问题描述

推荐答案