JS执行后递归地镜像网页

本文介绍了JS执行后递归地镜像网页的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试递归地映射网页，例如将所有页面都放在一个网页上。所有网页都在一个文件夹的子文件夹中，因此我可以使用wget轻松镜像所有网页：

I'm trying to mirror a webpage recursively, e.g. getting all pages on one webpage. All webpages are in subfolders of just one folder, therefore I could easily mirror all webpages using wget:

wget --mirror --recursive-页面要求--adjust-extension --no-parent --convert-links https://www.example.com/

但是，该页面在执行某些JS脚本之前已被镜像，并且那些JS脚本不会被镜像。我也需要以某种方式镜像它们，因为它们会更改网页的DOM。另一种选择是等待网站完成加载并镜像加载的网页（任务不是时间紧迫的任务）。

However, the page is mirrored before some JS scripts are executed, and those JS scripts don't get mirrored. I need to mirror them too, somehow, because they change the webpage's DOM. Another option would be to wait for the site to finish loading and mirroring the loaded webpage (the task isn't time critical).

我已经尝试了镜像网页与PhantomJS一起使用，但是我不能使用PhantomJS进行递归，或者至少我不知道怎么做。我还仔细查看了wget手册页，但找不到任何相应的选项。

I've already tried mirroring the webpage with PhantomJS, but I can't use recursion using PhantomJS, or at least I couldn't find out how. I also took a closer look at the wget man page, but couldn't find any corresponding options.

是否有可能这样做？

JS执行后递归地镜像网页

问题描述

推荐答案