我指的是stackoverflow上列出的以下问题:
Scrapy, scrapping data inside a javascript

我正在尝试复制@Rho给出的此问题的答案,以了解如何从javascript生成的表单中抓取数据。自发布此问题以来,表单的有效负载似乎已更改,因此我进行了相应的修改。

我的代码和输出如下:

>>>scrapy shell https://www.mcdonalds.com.sg/locate-us/

2015-07-07 12:09:28+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: scrapybot)
.....
2015-07-07 12:09:28+0800 [default] INFO: Spider opened
2015-07-07 12:09:32+0800 [default] DEBUG: Crawled (200) <GET https://www.mcdonalds.com.sg/locate-us/> (referer: None)
....
>>> url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'
>>> payload = {'action':'store_locator_locations'}
>>> head = {'X-Requested-With':'XMLHttpRequest'}
>>> from scrapy.http import FormRequest
>>> req=FormRequest(url,formdata=payload,headers=head)
>>> fetch(req)
2015-07-07 12:12:24+0800 [default] DEBUG: Crawled (404) <POST https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php> (referer: None)


预期的响应是200,但是正如您在上面看到的那样,我收到了404错误代码。

最佳答案

这不是代码本身的问题。您提到的原始问题和答案来自2013年;一生前在互联网上。

对于麦当劳新加坡和Wordpress来说,情况已经发生了变化。但不是那么多。

曾经是

url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'


就是现在

url = 'https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php'


(我是通过使用Chrome F12开发人员工具并查看“网络”标签发现此问题的)

实际上,您可以对此URL发出GET请求并获取JSON:


  得到
  
  https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php?action=store_locator_locations


[{
    "id": "417",
    "name": "McDonald\u2019s JCube",
    "address": "2 Jurong East Central 1<br\/>#01-09<br\/>JCube\r\n",
    "city": "Singapore",
    "lat": "1.33352",
    "long": "103.740277",
    "op_hours": "Mon-Fri: Opens at 0630<br>\r\nSat-Sun: Opens at 0700<br>\r\nSun-Thur: Closes at 2300 <br>\r\nFri\/Sat & PH Eve: Closes at 0000\r\n<br><br>\r\nDessert Kiosk: Daily 1100 - 2300",
    "phone": "66844228",
    "region": "west",
    "types": ["3"],
    "zip": "609731"
},
...
]

10-06 12:08