问题描述
我要放弃亚马逊所有交易页
http://www.amazon.com/gp/goldbox/所有的交易/ REF = sv_gb_1
所以我使用的卷曲PHP
$请求='http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
$ CH = curl_init();
curl_setopt($ CH,CURLOPT_URL,$请求);
curl_setopt($ CH,CURLOPT_HEADER,假);
curl_setopt($ CH,CURLOPT_RETURNTRANSFER,真正的);
curl_setopt($ CH,CURLOPT_SSL_VERIFYPEER,真正的);
curl_setopt($沟道,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($沟道,CURLOPT_TIMEOUT,80);
$ file_source = curl_exec($ CH);
的print_r($ file_source);
出口;
报废完成,但响应的页面内容股利空。内容全部来自于亚马逊的动态Ajax请求。我怎么能放弃使用PHP和卷曲的所有交易产品
我的回答像 链接
更新code
$请求='http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
$头[] =接受:text / html的,是application / xhtml + xml的,应用程序/ XML; Q = 0.9,* / *; Q = 0.8;
/ * $头[] =接受语言:EN-US,EN; Q = 0.5; * /
/ * $头[] =接受编码:gzip,紧缩; * /
$头[] ='曲奇: x-wl-uid=1vlKm5hBxhHPg37UgkrAPYZZaV0wv+T5knGezWJq0AIEWI30hJYp0XouddMIZeemj1LKAi9fDQq7aoFN+mbvlVYPTBQVLFdzs0aeTGWtiCY0Ay63L0ezPfZRKXQHC
/ Wum4ywRviFW9es =;会话ID时间=2082787201升;会话ID = 192-9168386-7231424; ubid,主要= 187-6710460-8617661
; session-token="+SFC4vDx/BvcD8D1Mdgeo2jtnTD0qPHF5j2nWNwbFGcRyW7/o4LBOmBHJosU5W0SgoAd6lhi0NZWg/6o5WE6o45k
+VCT5a5dgj0tltSEkBT80oWT0CDk+jCDEEhIcxnCe6aqkUn6soFiMJHIsMWujo4qyA6A70PC1xKGKdIFMUm3H0DGSdIMqITs4Mjb1
/ 1vY6GxnPeh5ncasxl + tUN2dHVwwJbj1ZrmyJdDxSDd8 / O =; __utma = 194891197.2101747155.1434117141.1434356635.1434362529
0.4; __utmz=194891197.1434362529.4.4.utmccn=(referral)|utmcsr=stackoverflow.com|utmcct=/questions/11589556
/检索-AN-亚马逊商店,列表中的副产品 - 使用 - PHP | utmcmd =转诊;的X主=Xi0312Ip8BrjoFoj6Zp9OLxDcU6kCvlm4DExlT5yNgHa2b3htenxvUsF2TZR3
?FN; s_pers =%20s_vnum%3D1866356399079%2526vn%253D2%7C1866356399079%3B%20s_invisit%3Dtrue%7C1434364356330
%3B%20s_nr%3D1434362556331重复%7C1442138556331%3B; CSM-命中= B-1RHERWP84F8S70KRQ903 | 1434453087266; preferred-GEO
=国家;用户preF = O9NYa0FpfOIAcRMnkQf7WL3LyhrjCsMBKgKfVxT4zK8uOTF5KjzPAwmz0DuVnfXhdkinEE1BEMgPn09eHwavl
+Hwl1BOSvjp1ewiG1iCXa0R77FsPOGbpq06MWB0MC7Wwff4gehUEAle5IfyFQqKGh1XvJ4YiMFsR2mwmyzzVJTo0WPGZzvvpCVLFmx22cRVwEi4sX8y
+IfEKu76B4p1GHPdZVo1HIwLooo8CT7lboNUi4Hhn6mhtyGCNEDLvWD8NII48Vd9EkcBjUpiSeNroRjYO9yNkj8SI3xJVI0befNipOfxAzPSnuQqeBpqm99bWArk9ZZl
+EM5QKzoPNJSF0FqVnnYavt4G6F/PHedaJVl8pU0A6N9lBjK6YZRFflyaoEYPtUW+nqK0xqO+YusAMAlhHBuW33KMdtt3i6oufQ4yTDqIgAiQ1ZTXcsb2tcu
; s_dslv = 1434370132739; LC-主要= EN_US; AWS-目标访问者-ID = 1434357190046-572838.22_02; AWS-目标数据
=%7B%22support%22%3A%221%22%7D; s_fid = 7BB6DD9CE8128EC3-2A07290402DD6AF6; s_vn = 1465893191447%26vn%3D1
; s_nr = 1434370132733 - 新; s_vnum = 1866370132735%26vn%3D1;皮肤= noskin; B2B,主要= 0;
$头[] =连接:保持活动;
$ reffer ='http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
$ CH = curl_init();
curl_setopt($ CH,CURLOPT_URL,$请求);
curl_setopt($ CH,CURLOPT_USERAGENT,Mozilla的/ 5.0(Windows NT的5.1; RV:38.0)的Gecko / 20100101 Firefox的/ 38.0');
curl_setopt($ CH,CURLOPT_HTTPHEADER,$头);
curl_setopt($ CH,CURLOPT_RETURNTRANSFER,真正的);
curl_setopt($沟道,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ CH,CURLOPT_REFERER,$ reffer);
curl_setopt($沟道,CURLOPT_TIMEOUT,80);
curl_setopt($沟道,CURLOPT_MAXREDIRS,10);
$ file_source = curl_exec($ CH);
的print_r($ file_source);
根据我的快速研制,你可以查询由亚马逊要求交易XHRs。
请参见拍摄。但是,如果你用的查询他们的PHP卷曲的你应该使用/模拟特定的请求头(包括cookie)的HTTP头:
更新
根据您的新袅袅的请求......
-
亚马逊页面(它的JS逻辑),使XHR到其服务器的每个产品。 XHRs看起来像这样:
http://www.amazon.com/xa/dealcontent/v2/GetDealMetadata?nocache=1434445645152
不可以http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1
这是唯一的引用者。 -
有关产品项目的请求是发表,不是GET。
- 您可能得到的cookie,从您的浏览器并将其插入到PHP的卷曲头。错误。这些饼干都是浏览器会议,不涉及到PHP服务器,将请求XHRs的会话。因此,对于这种使用饼干罐,请参见帖子。
- 在这篇文章的负载是一个对象,要形成与已知的结构。表单数据:<$c$c>{"requestMetadata":{"marketplaceID":"ATVPDKIKX0DER","sessionID":"175-4567874-0146849","clientID":"goldbox"},"widgetContext":{"pageType":"GoldBox","subPageType":"AllDeals","deviceType":"pc","refRID":"1VFVJBKEYZT3DGWSANXQ","widgetID":"1969939662","slotName":"center-6"},"page":1,"dealsPerPage":8,"itemResponseSize":"NONE","queryProfile":{"featuredOnly":false,"dealTypes":["LIGHTNING_DEAL","BEST_DEAL"],"includedCategories":["283155","599858","154606011"],"excludedExtendedFilters":{"MARKETING_ID":["restrictedcontent"]}}}$c$c>
查看开发工具的图片:
I want to scrap amazon all deals page
http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1
So i am using curl php
$request = 'http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$request);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 80);
$file_source = curl_exec($ch);
print_r($file_source);
exit;
scrapping completed but response page content div empty. contents all came from dynamic ajax requests in amazon. how can i scrap the all deal products using php and curl
My response image link
Update Code
$request = 'http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
/*$header[] = "Accept-Language: en-US,en;q=0.5";*/
/* $header[] = "Accept-Encoding: gzip, deflate";*/
$header[] = 'Cookie: x-wl-uid=1vlKm5hBxhHPg37UgkrAPYZZaV0wv+T5knGezWJq0AIEWI30hJYp0XouddMIZeemj1LKAi9fDQq7aoFN+mbvlVYPTBQVLFdzs0aeTGWtiCY0Ay63L0ezPfZRKXQHC
/Wum4ywRviFW9es=; session-id-time=2082787201l; session-id=192-9168386-7231424; ubid-main=187-6710460-8617661
; session-token="+SFC4vDx/BvcD8D1Mdgeo2jtnTD0qPHF5j2nWNwbFGcRyW7/o4LBOmBHJosU5W0SgoAd6lhi0NZWg/6o5WE6o45k
+VCT5a5dgj0tltSEkBT80oWT0CDk+jCDEEhIcxnCe6aqkUn6soFiMJHIsMWujo4qyA6A70PC1xKGKdIFMUm3H0DGSdIMqITs4Mjb1
/1vY6GxnPeh5ncasxl+tUN2dHVwwJbj1ZrmyJdDxSDd8/o="; __utma=194891197.2101747155.1434117141.1434356635.1434362529
.4; __utmz=194891197.1434362529.4.4.utmccn=(referral)|utmcsr=stackoverflow.com|utmcct=/questions/11589556
/retrieving-an-amazon-stores-list-of-products-using-php|utmcmd=referral; x-main="Xi0312Ip8BrjoFoj6Zp9OLxDcU6kCvlm4DExlT5yNgHa2b3htenxvUsF2TZR3
?Fn"; s_pers=%20s_vnum%3D1866356399079%2526vn%253D2%7C1866356399079%3B%20s_invisit%3Dtrue%7C1434364356330
%3B%20s_nr%3D1434362556331-Repeat%7C1442138556331%3B; csm-hit=b-1RHERWP84F8S70KRQ903|1434453087266; preferred-geo
=national; UserPref=O9NYa0FpfOIAcRMnkQf7WL3LyhrjCsMBKgKfVxT4zK8uOTF5KjzPAwmz0DuVnfXhdkinEE1BEMgPn09eHwavl
+Hwl1BOSvjp1ewiG1iCXa0R77FsPOGbpq06MWB0MC7Wwff4gehUEAle5IfyFQqKGh1XvJ4YiMFsR2mwmyzzVJTo0WPGZzvvpCVLFmx22cRVwEi4sX8y
+IfEKu76B4p1GHPdZVo1HIwLooo8CT7lboNUi4Hhn6mhtyGCNEDLvWD8NII48Vd9EkcBjUpiSeNroRjYO9yNkj8SI3xJVI0befNipOfxAzPSnuQqeBpqm99bWArk9ZZl
+EM5QKzoPNJSF0FqVnnYavt4G6F/PHedaJVl8pU0A6N9lBjK6YZRFflyaoEYPtUW+nqK0xqO+YusAMAlhHBuW33KMdtt3i6oufQ4yTDqIgAiQ1ZTXcsb2tcu
; s_dslv=1434370132739; lc-main=en_US; aws-target-visitor-id=1434357190046-572838.22_02; aws-target-data
=%7B%22support%22%3A%221%22%7D; s_fid=7BB6DD9CE8128EC3-2A07290402DD6AF6; s_vn=1465893191447%26vn%3D1
; s_nr=1434370132733-New; s_vnum=1866370132735%26vn%3D1; skin=noskin; b2b-main=0';
$header[] = "Connection: keep-alive";
$reffer = 'http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$request);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Firefox/38.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_TIMEOUT, 80);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
$file_source = curl_exec($ch);
print_r($file_source);
Based on my quick reseach you might query XHRs made by amazon to request deals.
See the shot.But if you to query them with php Curl you should use/imitate the http headers of that particular request headers (including cookies):
Update
Based on your new curl request...
The amazon page (its js logic) makes XHR to its server for each product item. XHRs look like this:
http://www.amazon.com/xa/dealcontent/v2/GetDealMetadata?nocache=1434445645152
nothttp://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1
which is only the referer.A request for product item is POST, not GET.
- You probably got cookie from your browser and inserted it into the php curl header. Wrong. These cookie are of your browser session, not related to a session of your php server that will requests XHRs. Therefore for this use cookie jar, see the post.
- The POST's load is an object, should be formed with known structure.Form data:
{"requestMetadata":{"marketplaceID":"ATVPDKIKX0DER","sessionID":"175-4567874-0146849","clientID":"goldbox"},"widgetContext":{"pageType":"GoldBox","subPageType":"AllDeals","deviceType":"pc","refRID":"1VFVJBKEYZT3DGWSANXQ","widgetID":"1969939662","slotName":"center-6"},"page":1,"dealsPerPage":8,"itemResponseSize":"NONE","queryProfile":{"featuredOnly":false,"dealTypes":["LIGHTNING_DEAL","BEST_DEAL"],"includedCategories":["283155","599858","154606011"],"excludedExtendedFilters":{"MARKETING_ID":["restrictedcontent"]}}}
See the developer tools picture:
- As Michael - sqlbot mentioned, you try to do an action that violates Amazon's terms of Use. But for the scrape technique's sake I still update my answer.
这篇关于报废亚马逊所有交易PHP的卷曲?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!