在没有活动项目的情况下使用Scrapy爬行本地文件?

本文介绍了在没有活动项目的情况下使用Scrapy爬行本地文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否可以在没有活动项目的情况下使用Scrapy 0.18.4来爬网本地文件?我已经看到了此答案，它看起来很有希望，但是使用crawl命令，您需要一个项目.

Is it possible to crawl local files with Scrapy 0.18.4 without having an active project? I've seen this answer and it looks promising, but to use the crawl command you need a project.

或者，是否存在一种简单/极简的方式来为现有蜘蛛建立项目?我在一个Python文件中定义了我的Spider，管道，中间件和项目.我已经创建了只有项目名称的scrapy.cfg文件.这使我可以使用crawl，但是由于我没有蜘蛛文件夹，因此Scrapy无法找到我的蜘蛛.我可以将Scrapy指向正确的目录，还是需要将我的物品，蜘蛛等拆分成单独的文件?

Alternatively, is there an easy/minimalist way to set up a project for an existing spider? I have my spider, pipelines, middleware, and items defined in one Python file. I've created a scrapy.cfg file with only the project name. This lets me use crawl, but since I don't have a spiders folder Scrapy can't find my spider. Can I point Scrapy to the right directory, or do I need to split my items, spider, etc. up into separate files?

[edit]我忘了说我正在使用Crawler.crawl(my_spider)运行Spider-理想情况下，我仍然希望能够像这样运行Spider，但是如果那样的话，可以在我的脚本的子进程中运行它不可能.

[edit] I forgot to say that I'm running the spider using Crawler.crawl(my_spider) - ideally I'd still like to be able to run the spider like that, but can run it in a subprocess from my script if that's not possible.

在我链接的答案中发现建议确实可行- http://localhost:8000 可以用作一个start_url，因此不需要项目.

Turns out the suggestion in the answer I linked does work - http://localhost:8000 can be used as a start_url, so there's no need for a project.

Spider

在没有活动项目的情况下使用Scrapy爬行本地文件?

问题描述

推荐答案