本文介绍了蜘蛛内的 Scrapyd jobid 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Framework Scrapy - Scrapyd 服务器.

Framework Scrapy - Scrapyd server.

我在蜘蛛中获取 jobid 值时遇到了一些问题.

I have some problem with getting jobid value inside the spider.

将数据发布到 http://localhost:6800/schedule.json 后,响应为

After post data to http://localhost:6800/schedule.json the response is

status = ok
jobid = bc2096406b3011e1a2d0005056c00008

但是我需要在这个过程中在当前蜘蛛中使用这个jobid.可用于打开{jobid}.log 文件或其他动态原因.

But I need use this jobid inside the current spider during the process. It can be used for open {jobid}.log file or other dynamic reasons.

class SomeSpider(BaseSpider):
    name = "some"
    start_urls = ["http://www.example.com/"]
    def parse(self, response):
        items = []
        for val in values:
            item = SomeItem()
            item['jobid'] = self.jobid # ???!
            items.append(item)
        return items

但是我只有在任务完成后才能看到这个 jobid :( 谢谢!

But I see this jobid only after the task is finihed :( Thanks!

推荐答案

我想有一个更简单的方法,但您可以从命令行参数中提取作业 ID.IIRC,scrapyd 启动一个蜘蛛,在参数中给它一个 jobid.只需探索需要 jobid 的 sys.args.

I guess there is an easier way, but you can extract job id from command line args. IIRC, scrapyd launches a spider giving it a jobid in parameters. Just explore sys.args where you need jobid.

这篇关于蜘蛛内的 Scrapyd jobid 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 11:54