1. 安装hanziconv
安装一个简繁体转换的包:
pip install hanziconv
2. 自定义一个itempiples
找到项目中的pipelines.py文件
添加自定义的pipeline:
from hanziconv import HanziConv class HanziconvPipeline(object): def process_item(self, item, spider):
project_info = item['project_info']
for key, value in project_info.items():
if value is not None:
if isinstance(value, unicode):
value = HanziConv.toTraditional(str(value))
print key, value
project_info[key] = value
else: # 不为中文不处理
pass
else: # value为None 初始化为空串
project_info[key] = ""
return item
此代码为本人项目代码,判断value为unicode,则转换为繁体;
若要将繁体转换为简体,请将toTraditional改为toSimplified。
3. 配置项目pipeline
找到settings.py中的ITEM_PIPELINES
添加自定义的pipelines:
ITEM_PIPELINES = {
'scrapy_redis.pipelines.RedisPipeline': 400,
'<project_name>.pipelines.HanziconvPipeline': 300
}
:warning: <project_name>需手动修改为自己的项目名称!
转载于 https://blog.csdn.net/weixin_34082854/article/details/87429754