我有一个带有以下内容的文本文件:
str = '0|Crazy Taxi\xe2\x84\xa2 City Rush^Truck Racing Super Gear^Candy Crush Soda Saga^Car Parking^BMX Kid^Hill Climb Racing^UNLimited Kareena Kapoor^3D Car Parking^Find My Android Phone!^Christmas Trains^Top Free Games^Telegram^Door Screen Lock^Adventure of Ted 2 - Free^Sonic Jump^'
我想使用以下代码行删除“ \ xe2 \ x84 \ xa2”:
print unicode(str,errors="ignore")
output = '0|Crazy Taxi City Rush^Truck Racing Super Gear^Candy Crush Soda Saga^Car Parking^BMX Kid^Hill Climb Racing^UNLimited Kareena Kapoor^3D Car Parking^Find My Android Phone!^Christmas Trains^Top Free Games^Telegram^Door Screen Lock^Adventure of Ted 2 - Free^Sonic Jump^'
但是,当我使用下面提到的代码在完整文件上运行相同的逻辑时:
with open('train_data_dump.txt', mode='r') as document:
for line in document:
print unicode(line,errors='ignore')
它像以前一样打印行。
随意提问如果我对提问的理解不够清楚,请提供帮助。
最佳答案
当您从文件中分配变量时,就好像您分配了原始字符串一样-反斜杠被视为普通字母。首先需要解码转义的字符。
unicode(i.decode("string_escape"), errors="ignore")
Python Specific Encodings