问题描述
我写了一个函数来使用CURL抓取网站,但是在调用时不返回任何内容,并且不能理解为什么。输出为空
I have written a function to scrape a website using CURL but it returns nothing when called and can't understand why. The output is empty
<?php
function scrape($url)
{
$headers = Array(
"Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
"Cache-Control: max-age=0",
"Connection: keep-alive",
"Keep-Alive: 300",
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Accept-Language: en-us,en;q=0.5",
"Pragma: "
);
$config = Array(
CURLOPT_RETURNTRANSFER => TRUE ,
CURLOPT_FOLLOWLOCATION => TRUE ,
CURLOPT_AUTOREFERER => TRUE ,
CURLOPT_CONNECTTIMEOUT => 120 ,
CURLOPT_TIMEOUT => 120 ,
CURLOPT_MAXREDIRS => 10 ,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8" ,
CURLOPT_URL => $url ,
) ;
$handle = curl_init() ;
curl_setopt_array($handle,$config) ;
curl_setopt($handle,CURLOPT_HTTPHEADER,$headers) ;
$data = curl_exec($handle) ;
curl_close($handle) ;
return $data ;
}
echo scrape("https://www.google.com") ;
?>
推荐答案
尝试抓取ssl或https网址:
There are 2 possible fixes when trying to scrape a ssl or https url:
- 快速修复
- 正确修复
快速修复,先。
警告:旨在防范。
设置: CURLOPT_SSL_VERIFYPEER => false
第二次正确修复。设置3个选项:
The second, and proper fix. Set 3 options:
-
CURLOPT_SSL_VERIFYPEER => true
-
CURLOPT_SSL_VERIFYHOST => 2
-
CURLOPT_CAINFO => getcwd()。 '\CAcert.pem'
CURLOPT_SSL_VERIFYPEER => true
CURLOPT_SSL_VERIFYHOST => 2
CURLOPT_CAINFO => getcwd() . '\CAcert.pem'
您最后需要做的是下载CA证书。
The last thing you need to do is download the CA certificate.
前往 - - >点击'cacert.pem' - > copie /将文本粘贴到文本编辑器 - >将文件另存为'CAcert.pem'检查它isn' t'CAcert.pem。 txt '
Go to, - http://curl.haxx.se/docs/caextract.html -> click 'cacert.pem' -> copie/paste the text in to a text editor -> save the file as 'CAcert.pem' Check it isn't 'CAcert.pem.txt'
<?php
function scrape($url)
{
$headers = Array(
"Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
"Cache-Control: max-age=0",
"Connection: keep-alive",
"Keep-Alive: 300",
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Accept-Language: en-us,en;q=0.5",
"Pragma: "
);
$config = Array(
CURLOPT_SSL_VERIFYPEER => true,
CURLOPT_SSL_VERIFYHOST => 2,
CURLOPT_CAINFO => getcwd() . '\CAcert.pem',
CURLOPT_RETURNTRANSFER => TRUE ,
CURLOPT_FOLLOWLOCATION => TRUE ,
CURLOPT_AUTOREFERER => TRUE ,
CURLOPT_CONNECTTIMEOUT => 120 ,
CURLOPT_TIMEOUT => 120 ,
CURLOPT_MAXREDIRS => 10 ,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8" ,
CURLOPT_URL => $url
) ;
$handle = curl_init() ;
curl_setopt_array($handle,$config) ;
curl_setopt($handle,CURLOPT_HTTPHEADER,$headers) ;
$output->data = curl_exec($handle) ;
if(curl_exec($handle) === false) {
$output->error = 'Curl error: ' . curl_error($handle);
} else {
$output->error = 'Operation completed without any errors';
}
curl_close($handle) ;
return $output ;
}
$scrape = scrape("https://www.google.com") ;
echo $scrape->data;
//uncomment for errors
//echo $scrape->error;
?>
这篇关于如何清除SSL或HTTPS网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!