我指的是stackoverflow上列出的以下问题:
Scrapy, scrapping data inside a javascript
我正在尝试复制@Rho给出的此问题的答案,以了解如何从javascript生成的表单中抓取数据。自发布此问题以来,表单的有效负载似乎已更改,因此我进行了相应的修改。
我的代码和输出如下:
>>>scrapy shell https://www.mcdonalds.com.sg/locate-us/
2015-07-07 12:09:28+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: scrapybot)
.....
2015-07-07 12:09:28+0800 [default] INFO: Spider opened
2015-07-07 12:09:32+0800 [default] DEBUG: Crawled (200) <GET https://www.mcdonalds.com.sg/locate-us/> (referer: None)
....
>>> url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'
>>> payload = {'action':'store_locator_locations'}
>>> head = {'X-Requested-With':'XMLHttpRequest'}
>>> from scrapy.http import FormRequest
>>> req=FormRequest(url,formdata=payload,headers=head)
>>> fetch(req)
2015-07-07 12:12:24+0800 [default] DEBUG: Crawled (404) <POST https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php> (referer: None)
预期的响应是
200
,但是正如您在上面看到的那样,我收到了404
错误代码。 最佳答案
这不是代码本身的问题。您提到的原始问题和答案来自2013年;一生前在互联网上。
对于麦当劳新加坡和Wordpress来说,情况已经发生了变化。但不是那么多。
曾经是
url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'
就是现在
url = 'https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php'
(我是通过使用Chrome F12开发人员工具并查看“网络”标签发现此问题的)
实际上,您可以对此URL发出
GET
请求并获取JSON:得到
https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php?action=store_locator_locations
[{
"id": "417",
"name": "McDonald\u2019s JCube",
"address": "2 Jurong East Central 1<br\/>#01-09<br\/>JCube\r\n",
"city": "Singapore",
"lat": "1.33352",
"long": "103.740277",
"op_hours": "Mon-Fri: Opens at 0630<br>\r\nSat-Sun: Opens at 0700<br>\r\nSun-Thur: Closes at 2300 <br>\r\nFri\/Sat & PH Eve: Closes at 0000\r\n<br><br>\r\nDessert Kiosk: Daily 1100 - 2300",
"phone": "66844228",
"region": "west",
"types": ["3"],
"zip": "609731"
},
...
]