问题描述
一个简单的问题.我可以从duckduckgo搜索的第一页抓取结果.但是,我正在努力进入第二页及后续页面.我已经将Python与Selenium网络驱动程序一起使用,这对于第一页的结果很好.我用来抓取第一页的代码是:-
A simple question. i can scrape results from the first page of a duckduckgo search. However i am struggling to get onto the 2nd and subsequent pages. I have used Python with the Selenium webdriver, which is fine for the first page results.The code i have used to scrape the first page is:-
results_url = "https://duckduckgo.com/?q=paralegal&t=h_&ia=web"
browser.get(results_url)
results = browser.find_elements_by_id('links')
num_page_items = len(results)
for i in range(num_page_items):
print(results[i].text)
print(len(results))
nxt_page = browser.find_element_by_link_text("Load More")
if nxt_page:
nxt_page.send_keys(Keys.PAGE_DOWN)"
有换行符指示新页面的开始,但是它们似乎没有改变URL,因此我尝试了上述内容将页面下移,然后重复代码以查找next_page上的链接.但是,它不起作用.任何帮助将不胜感激
There are line breaks indicating the start of a new page but they do not appear to alter the url, so i tried the above to move down the page and then repeat the code for finding the links on the next_page. However it does not work.Any help would be very much appreciated
推荐答案
如果我在结果的源代码中搜索Load More
,则找不到它.您是否尝试过使用 non-javascript 版本?
If I search for Load More
in the source code of the result I can't find it. Did you try using the non-javascript version?
您可以通过将html
添加到url来使用它:https://duckduckgo.com/html?q=paralegal&t=h_&ia=web
您可以在最后找到next
按钮.
You can use it by simply add html
to the url:https://duckduckgo.com/html?q=paralegal&t=h_&ia=web
There you can find the next
button at the end.
这对我有用(Chrome版本):
This one works for me (Chrome version):
results_url = "https://duckduckgo.com/html?q=paralegal&t=h_&ia=web"
browser.get(results_url)
results = browser.find_elements_by_id('links')
num_page_items = len(results)
for i in range(num_page_items):
print(results[i].text)
print(len(results))
nxt_page = browser.find_element_by_xpath('//input[@value="Next"]')
if nxt_page:
browser.execute_script('arguments[0].scrollIntoView();', nxt_page)
nxt_page.click()
顺便说一句:Duckduckgo还提供了一个不错的api,它可能更易于使用;)
Btw.: Duckduckgo also provides a nice api, which is probably much easier to use ;)
修复下一页链接的选择器,该选择器在第二个结果页面上选择了prev
按钮(感谢@kingbode)
edit: fix selector for next page link which selected the prev
button on the second result page (thanks to @kingbode)
这篇关于使用Python 3.6抓取Duckduckgo的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!