ruby - 机械化html抓取问题

所以我尝试使用ruby mechanize和hpricot来提取我网站的电子邮件。
我试图在我的管理端的所有页面上执行循环，并用hpricot解析这些页面。到目前为止还不错。然后我得到：

Exception `Net::HTTPBadResponse' at /usr/lib/ruby/1.8/net/http.rb:2022 - wrong status line: *SOME HTML CODE HERE*

当它解析一堆页面时，它以超时开始，然后打印页面的html代码。
不明白为什么？我如何调试它？
看起来机械化一行可以超过10页？有可能吗？？
谢谢



require 'logger'
require 'rubygems'
require 'mechanize'
require 'hpricot'
require 'open-uri'


class Harvester

def initialize(page)
    @page=page
    @agent = WWW::Mechanize.new{|a| a.log = Logger.new("logs.log") }
    @agent.keep_alive=false
    @agent.read_timeout=15

end

def login
    f = @agent.get( "http://****.com/admin/index.asp") .forms.first
    f.set_fields(:username => "user", :password =>"pass")
        f.submit

  end

def harvest(s)
    pageNumber=1
    #@agent.read_timeout =
    s.upto(@page) do |pagenb|

    puts "*************************** page= #{pagenb}/#{@page}***************************************"
    begin
        #time=Time.now
        #[email protected]( "http://****.com/admin/members.asp?action=search&term=&state_id=&r=500&p=#{page}")
        extract(pagenb)

    rescue => e
        puts  "unknown #{e.to_s}"
        #puts  "url:http://****.com/admin/members.asp?action=search&term=&state_id=&r=500&p=#{page}"
        #sleep(2)
        extract(pagenb)

    rescue Net::HTTPBadResponse => e
        puts "net exception"+ e.to_s
    rescue WWW::Mechanize::ResponseCodeError => ex
        puts "mechanize error: "+ex.response_code
    rescue Timeout::Error => e
        puts "timeout: "+e.to_s
    end


end

结束
def提取（第页）
#放置search.body
[email protected]（“http://**.com/admin/members.asp？action=search&term=&state\&id=&r=500&p={page}“）
doc=hpricot（搜索.正文）
        #remove titles
        #~ doc.search("/html/body/div/table[2]/tr/td[2]/table[3]/tr[1]").remove

        (doc/"/html/body/div/table[2]/tr/td[2]/table[3]//tr").each do |tr|
            #delete the phone number from the html
            temp = tr.search("/td[2]").inner_html
            index = temp.index('<')
            email = temp[0..index-1]
            puts  email
            f=File.open("./emails", 'a')
            f.puts(email)
            f.close
        end

结束
结束
放置“开始Extracting电子邮件…”
开始=argv[0].
H=新收割机（186）
H.登录
收获（开始）
                                    最佳答案            
            
            mechanize将页面的全部内容放入历史记录中，这可能会在浏览多个页面时造成问题。要限制历史的规模，请尝试
@mech = WWW::Mechanize.new do |agent|
  agent.history.max_size = 1
end