问题描述
我正在写一个专门的PHP代理,并被cURL的功能所困扰。
I'm writing a specialized PHP proxy and got stumped by a feature of cURL.
如果设置了以下值:
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_HEADER, true );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
cURL正确处理重定向,但返回所有页面标题,而不仅仅是最终,例如
cURL correctly handles redirects, but returns ALL page headers, not just the final (non-redirect) page, e.g.
HTTP/1.1 302 Found
Location: http://otherpage
Set-Cookie: someCookie=foo
Content-Length: 198
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 3241
<!DOCTYPE HTML>
...rest of content
请注意,CURLOPT_HEADER设置是因为我需要读取
Note that CURLOPT_HEADER is set because I need to read and copy parts of the original header into my proxy header.
我很感谢为什么它返回所有这些头(例如,我的代理代码必须检测在302头中设置的任何cookie并通过它们)。然而,它也使得不可能检测何时标头结束和内容开始。通常,使用一个标题,我们可以做一个简单的拆分:
I appreciate why it's returning all these headers (for example, my proxy code must detect any cookies set in the 302 header and pass them along). HOWEVER, it also makes it impossible to detect when the headers end and the content begins. Normally, with one header we could just do a simple split:
$split = preg_split('/\r\n\r\n/', $fullPage, 2)
但这显然不会工作这里。嗯。如果看起来下一行是标题的一部分,我们可以尝试一下:
But that obviously won't work here. Hm. We could try something that only splits if it looks like the next line is part of a header:
$split = preg_split('/\r\n\r\nHTML\/(1\.0|1\.1) \\d+ \\w+/', $fullPage)
// matches patterns such a "\r\n\r\nHTML/1.1 302 Found"
这几乎会一直工作,但是如果有人在他们的页面有以下情况,就会窒息:
Which will work almost all the time, but chokes if someone has the following in their page:
...and for all you readers out there, here is an example HTTP header:
<PRE>
HTTP/1.1 200 OK
BALLS!
当遇到 \r\\\
,后面不紧跟
\r\\\
的任何模式时,我们真的希望拆分停止匹配 HTML / 1.x
- 有没有办法用PHP RegExs?即使这个解决方案可以阻止(非常罕见的)有人在其内容的开头放置HTTP头的情况。在cURL中有一种方法可以将所有返回的页面作为数组?
We really want the split to stop matching as soon as it encounters any pattern of \r\n\r\n
that isn't immediately followed by HTML/1.x
- is there a way to do this with PHP RegExs? Even this solution can choke on the (quite rare) situation where someone puts an HTTP header right at the beginning of their content. Is there a way in cURL to get all of the returned pages as an array?
推荐答案
头文件大小,并将字符串分割如下:
You can get the information of the total header size, and split the string up like this:
$buffer = curl_exec($ch);
$curl_info = curl_getinfo($ch);
curl_close($ch);
$header_size = $curl_info["header_size"];
$header = substr($buffer, 0, $header_size);
$body = substr($buffer, $header_size)
=http://forums.digitalpoint.com/showthread.php?t=474585 =noreferrer>爷爷的实用讯息。
Information taken from the helpful post by "grandpa".
这篇关于cURL和重定向 - 返回多个标头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!