cURL操作在120308毫秒后超时

cURL操作在120308毫秒后超时

本文介绍了php cURL操作在120308毫秒后超时,X接收到-1个字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在抓取脚本中偶尔遇到此错误(请参阅标题)。

I'm experiencing this error (see Title) occasionally in my scraping script.

X是整数​​字节数> 0,网络服务器发送响应的实际字节数。我用Charles代理调试此问题,这是我看到的

X is the integer number of bytes > 0, the real number of bytes the webserver sent in response. I debugged this issue with Charles proxy and here is what I see

如您所见,没有Content-Length:header响应,代理仍然等待数据所以cURL等待了2分钟,放弃了)

As you can see there is no Content-Length: header in response, and the proxy still waits for the data (and so the cURL waited for 2 minutes and gave up)

cURL错误代码是28.

The cURL error code is 28.

一些调试信息从详细curl输出与该请求的var_export'ed curl_getinfo():

Below is some debug info from verbose curl output with var_export'ed curl_getinfo() of that request:

* About to connect() to proxy 127.0.0.1 port 8888 (#584)
*   Trying 127.0.0.1...
* Adding handle: conn: 0x2f14d58
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 584 (0x2f14d58) send_pipe: 1, recv_pipe: 0
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#584)
> GET http://bakersfield.craigslist.org/sof/3834062623.html HTTP/1.0
User-Agent: Firefox (WindowsXP) Ц Mozilla/5.1 (Windows; U; Windows NT 5.1; en-GB
; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Host: bakersfield.craigslist.org
Accept: */*
Referer: http://bakersfield.craigslist.org/sof/3834062623.html
Proxy-Connection: Keep-Alive

< HTTP/1.1 200 OK
< Cache-Control: max-age=300, public
< Last-Modified: Thu, 11 Jul 2013 21:50:17 GMT
< Date: Thu, 11 Jul 2013 21:50:17 GMT
< Vary: Accept-Encoding
< Content-Type: text/html; charset=iso-8859-1
< X-MCP-Cache-Control: max-age=2592000, public
< X-Frame-Options: SAMEORIGIN
* Server Apache is not blacklisted
< Server: Apache
< Expires: Thu, 11 Jul 2013 21:55:17 GMT
* HTTP/1.1 proxy connection set close!
< Proxy-Connection: Close
<
* Operation timed out after 120308 milliseconds with 4636 out of -1 bytes receiv
ed
* Closing connection 584
Curl error: 28 Operation timed out after 120308 milliseconds with 4636 out of -1
 bytes received http://bakersfield.craigslist.org/sof/3834062623.htmlarray (
  'url' => 'http://bakersfield.craigslist.org/sof/3834062623.html',
  'content_type' => 'text/html; charset=iso-8859-1',
  'http_code' => 200,
  'header_size' => 362,
  'request_size' => 337,
  'filetime' => -1,
  'ssl_verify_result' => 0,
  'redirect_count' => 0,
  'total_time' => 120.308,
  'namelookup_time' => 0,
  'connect_time' => 0,
  'pretransfer_time' => 0,
  'size_upload' => 0,
  'size_download' => 4636,
  'speed_download' => 38,
  'speed_upload' => 0,
  'download_content_length' => -1,
  'upload_content_length' => 0,
  'starttransfer_time' => 2.293,
  'redirect_time' => 0,
  'certinfo' =>
  array (
  ),
  'primary_ip' => '127.0.0.1',
  'primary_port' => 8888,
  'local_ip' => '127.0.0.1',
  'local_port' => 63024,
  'redirect_url' => '',
)

我可以像添加curl选项一样避免这些超时。这不是连接超时,也不是数据等待超时 - 这两个设置不工作,因为curl实际连接成功并收到一些数据,所以错误的超时总是〜= 120000毫秒。

Can I do something like adding a curl option to avoid these timeouts. And this is not a connection timeout, nor data wait timeout - both of these settings do not work as curl actually connects successfully and receives some data, so the timeout in error is always ~= 120000 ms.

推荐答案

我注意到你正在尝试解析Craigslist;它可以是他们的防洪保护吗?
如果您尝试解析其他网站,问题仍然存在吗?我尝试递归地映射FTP时遇到了同样的问题。

I noticed you're trying to parse Craigslist; could it be an anti-flood protection of theirs?Does the problem still exist if you try to parse other website? I once had the same issue trying to recursively map an FTP.

关于超时,如果你确定这不是连接超时和数据等待超时(CURLOPT_CONNECTTIMEOUT / CURLOPT_TIMEOUT)我试着增加PHP脚本本身的限制:

Regarding the timeouts, if you are sure that is isn't neither a connection timeout nor a data wait timeout (CURLOPT_CONNECTTIMEOUT / CURLOPT_TIMEOUT) I'd try increasing the limit of the PHP script itself:

set_time_limit(0);

这篇关于php cURL操作在120308毫秒后超时,X接收到-1个字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-26 01:34