如何使用 node.js 抓取需要身份验证的站点?

本文介绍了如何使用 node.js 抓取需要身份验证的站点?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到过很多教程解释了如何使用 node.js 抓取不需要身份验证/登录的公共网站.

谁能解释一下如何使用 node.js 抓取需要登录的网站?

Can somebody explain how to scrape sites that require login using node.js?

推荐答案

使用 Mikeal's Request 库，你需要像这样启用 cookie 支持:

Use Mikeal's Request library, you need to enable cookies support like this:

var request = request.defaults({jar: true})

因此，您首先应该在该站点上(手动)创建一个用户名，并在向该站点发出 POST 请求时将用户名和密码作为参数传递.之后，服务器将使用 Request 会记住的 cookie 进行响应，因此您将能够访问需要您登录该站点的页面.

So you first should create a username on that site (manually) and pass the username and the password as params when making the POST request to that site. After that the server will respond with a cookie which Request will remember, so you will be able to access the pages that require you to be logged into that site.

注意:如果在登录页面上使用了 reCaptcha 之类的东西，这种方法将不起作用.

Note: this approach doesn't work if something like reCaptcha is used on the login page.

这篇关于如何使用 node.js 抓取需要身份验证的站点?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！