问题描述
我要在页面上刮所有问题的链接和标题 https://www.reddit.com/search?q=Expiration&type=link&sort=new .元素具有以下结构:
I want to scrape the link and title of all the questions on the page https://www.reddit.com/search?q=Expiration&type=link&sort=new. An element has the following structure:
<a data-click-id="body" class="SQnoC3ObvgnGjWt90zD9Z" href="/r/excel/comments/ayiahc/calculating_expiration_dates_previous_solution_no/">
<h2 class="s1okktje-0 cDxKta">
<span style="font-weight:normal">Calculating Expiration Dates - Previous Solution No Longer Works</span>
</h2>
</a>
我使用questions = driver.find_elements_by_xpath('//a[@data-click-id="body"]')
来获取问题,然后通过for
对其进行迭代.而且我很高兴使用question.get_attribute('href')
来获取链接.
I use questions = driver.find_elements_by_xpath('//a[@data-click-id="body"]')
to get the questions then iterate them by for
. And I coud use question.get_attribute('href')
to get the link.
但是,我不知道如何从question
中提取span
中的标题.
However, I don't know how to extract the title inside the span
(from a question
).
有人知道该怎么做吗?
推荐答案
尝试以下操作.
question.find_element_by_tag_name('span').text
或者简单地
question.text
这篇关于如何使用Selenium Python从reddit.com搜索页面上的问题中提取标题和href属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!