本文介绍了如何修复“类型错误:不能混合 str 和非 str 参数"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在编写一些抓取代码并遇到上述错误.我的代码如下.
I'm writing some scraping codes and experiencing an error as above.My code is following.
# -*- coding: utf-8 -*-
import scrapy
from myproject.items import Headline
class NewsSpider(scrapy.Spider):
name = 'IC'
allowed_domains = ['kosoku.jp']
start_urls = ['http://kosoku.jp/ic.php']
def parse(self, response):
"""
extract target urls and combine them with the main domain
"""
for url in response.css('table a::attr("href")'):
yield(scrapy.Request(response.urljoin(url), self.parse_topics))
def parse_topics(self, response):
"""
pick up necessary information
"""
item=Headline()
item["name"]=response.css("h2#page-name ::text").re(r'.*(インターチェンジ)')
item["road"]=response.css("div.ic-basic-info-left div:last-of-type ::text").re(r'.*道$')
yield item
当我在 shell 脚本上单独执行它们时,我可以获得正确的响应,但是一旦它进入程序并运行,它就不会发生.
I can get the correct response when I do them individually on a shell script, but once it gets in a programme and run, it doesn't happen.
2017-11-27 18:26:17 [scrapy.core.scraper] ERROR: Spider error processing <GET http://kosoku.jp/ic.php> (referer: None)
Traceback (most recent call last):
File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Users/sonogi/scraping/myproject/myproject/spiders/IC.py", line 16, in parse
yield(scrapy.Request(response.urljoin(url), self.parse_topics))
File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/http/response/text.py", line 82, in urljoin
return urljoin(get_base_url(self), url)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py", line 424, in urljoin
base, url, _coerce_result = _coerce_args(base, url)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py", line 120, in _coerce_args
raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments
2017-11-27 18:26:17 [scrapy.core.engine] INFO: Closing spider (finished)
我很困惑,感谢任何人的帮助!
I'm so confused and appreciate anyone's help upfront!
推荐答案
根据 Scrapy 文档,您使用的 .css(selector)
方法返回一个 SelectorList 实例.如果您想要 url 的实际(unicode)字符串版本,请调用 extract()
方法:
According to the Scrapy documentation, the .css(selector)
method that you're using, returns a SelectorList instance. If you want the actual (unicode) string version of the url, call the extract()
method:
def parse(self, response):
for url in response.css('table a::attr("href")').extract():
yield(scrapy.Request(response.urljoin(url), self.parse_topics))
这篇关于如何修复“类型错误:不能混合 str 和非 str 参数"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!