如果您所有的代码都能正常工作,那么您可以将它们简单地串联到一个方法调用中:def home require 'mechanize' mechanize = Mechanize.new @uninames_array = [] page = mechanize.get('http://search.ucas.com/search/providers?CountryCode=3&RegionCode=&Lat=&Lng=&Feather=&Vac=2&Query=&ProviderQuery=&AcpId=&Location=scotland&IsFeatherProcessed=True&SubjectCode=&AvailableIn=2016') page.search('li.result h3').each do |h3| name = h3.text @uninames_array.push(name) end while next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href']) page.search('li.result h3').each do |h3| name = h3.text @uninames_array.push(name) end end@duration_array = []@qual_array = []@courses_array = []page = mechanize.get('http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1&providerids=41')page.search('div.courseinfoduration').each do |x|puts x.text.strippage.search('div.courseinfooutcome').each do |y|puts y.text.stripendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.courseinfoduration').each do |x| name = x @duration_array.push(name) puts x.text.strip endendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.courseinfooutcome').each do |y| name = y @qual_array.push(name) puts y.text.strip endendpage.search('div.coursenamearea h4').each do |h4|puts h4.text.stripendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.coursenamearea h4').each do |h4| name = h4.text @courses_array.push(name) puts h4.text.strip endendTrying to figure out a way to use one mechanise to scrape and add to arrays all of the data we want from the UCAS website. Currently we're struggling with coding in the link clicks for mechanise. Wondering if anyone can help, there are three successive link clicks amidst loops to progress through all search result pages.The first link to display all courses for university is within div class morecourseslinkthe second link to display course names, duration and qual is in div class coursenameareathe third link is in div coursedetailsshowable and the a id is coursedetailtab_entryreqscurrently we are scraping uninames with the below:class PagesController < ApplicationController def homerequire 'mechanize'mechanize = Mechanize.new@uninames_array = [] page = mechanize.get('http://search.ucas.com/search/providers?CountryCode=3&RegionCode=&Lat=&Lng=&Feather=&Vac=2&Query=&ProviderQuery=&AcpId=&Location=scotland&IsFeatherProcessed=True&SubjectCode=&AvailableIn=2016')page.search('li.result h3').each do |h3| name = h3.text @uninames_array.push(name)endwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href']) page.search('li.result h3').each do |h3| name = h3.text @uninames_array.push(name) endendputs @uninames_array.to_s endendAnd course names duration and qualification from the below:require 'mechanize'mechanize = Mechanize.new@duration_array = []@qual_array = []@courses_array = []page = mechanize.get('http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1&providerids=41')page.search('div.courseinfoduration').each do |x|puts x.text.strippage.search('div.courseinfooutcome').each do |y|puts y.text.stripendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.courseinfoduration').each do |x| name = x @duration_array.push(name) puts x.text.strip endendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.courseinfooutcome').each do |y| name = y @qual_array.push(name) puts y.text.strip endendpage.search('div.coursenamearea h4').each do |h4|puts h4.text.stripendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.coursenamearea h4').each do |h4| name = h4.text @courses_array.push(name) puts h4.text.strip endendend 解决方案 If you want to do this with one Mechanize instance why not just string them all together and store the pages you need to jump to and from in variables?If all your code works then you can simply string them together into one method call:def home require 'mechanize' mechanize = Mechanize.new @uninames_array = [] page = mechanize.get('http://search.ucas.com/search/providers?CountryCode=3&RegionCode=&Lat=&Lng=&Feather=&Vac=2&Query=&ProviderQuery=&AcpId=&Location=scotland&IsFeatherProcessed=True&SubjectCode=&AvailableIn=2016') page.search('li.result h3').each do |h3| name = h3.text @uninames_array.push(name) end while next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href']) page.search('li.result h3').each do |h3| name = h3.text @uninames_array.push(name) end end@duration_array = []@qual_array = []@courses_array = []page = mechanize.get('http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1&providerids=41')page.search('div.courseinfoduration').each do |x|puts x.text.strippage.search('div.courseinfooutcome').each do |y|puts y.text.stripendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.courseinfoduration').each do |x| name = x @duration_array.push(name) puts x.text.strip endendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.courseinfooutcome').each do |y| name = y @qual_array.push(name) puts y.text.strip endendpage.search('div.coursenamearea h4').each do |h4|puts h4.text.stripendwhile next_page_link = page.at('.pager a[text()=">"]') page = mechanize.get(next_page_link['href'])page.search('div.coursenamearea h4').each do |h4| name = h4.text @courses_array.push(name) puts h4.text.strip endend 这篇关于数据抓取多个页面点击循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-01 03:43