我正在使用Python 2.7和Scrapy 1.3
我的Scrapy代码是:
import scrapy
class CinemaSpider(scrapy.Spider):
name = "cinema"
allowed_domains = ['cineroxy.com.br']
start_urls = [
'http://cineroxy.com.br/programacao-brisamar',
]
def parse(self, response):
movie_names = response.css('.titulo p::text').extract()
for movie_name in movie_names:
yield {
'name': movie_name.strip()
}
我这样执行它:
C:\Python27\Scripts>scrapy runspider cinema_scraper.py -o movies.json
结果是:
[
{"name": "A Bailarina"},
{"name": "Assassins Creed - O Filme"},
{"name": "Cinquenta Tons Mais Escuros"},
{"name": "Minha M\u00e3e \u00e9 uma Pe\u00e7a 2"},
{"name": "Moana - Um Mar de Aventura"},
{"name": "Os Penetras 2 - Quem D\u00e1 Mais?"},
{"name": "Quatro Vidas de Um Cachorro"},
{"name": "Resident Evil 6: O \u00daltimo Cap\u00edtulo"},
{"name": "xXx: Reativado"}
]
如何修复口音
Minha M\u00e3e \u00e9 uma Pe\u00e7a 2
Os Penetras 2 - Quem D\u00e1 Mais?
Resident Evil 6: O \u00daltimo Cap\u00edtulo
?
提前致谢..
最佳答案
使用FEED_EXPORT_ENCODING选项:
FEED_EXPORT_ENCODING = 'utf-8'
您可以在settings.py或custom_settings spider属性中或通过命令行进行设置:
scrapy runspider cinema_scraper.py -s FEED_EXPORT_ENCODING=utf8 -o movies.json
关于css - 在Scrapy的JSON导出中启用重音符号?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41934287/