问题描述
我得到一个 UnicodeEncodeError
将一个特殊字符的文本写入文件:
文件D:\SOFT\Python3\lib\encodings\cp1252.py,第19行,编码
返回codecs.charmap_encode(input,self.errors,encoding_table)[ 0]
UnicodeEncodeError:'charmap'编解码器不能在位置956处编码字符'\\\�':字符映射到< undefined>
我的代码:
expFile = open(expFilePath,'w')
#data var是什么包含一个特殊的char
expFile.write(\\\
\\\
+ data)
数据可能是某些类似Microsoft Word的怪异字符,粘贴到应用程序的HTML表单中,坚持,现在我正在进口。我甚至不能看到它,当我查询它在我的数据库编辑器中显示为一个钻石。它在文本编辑器中只有一个占位符。应该更严格地检查字符集合的输入,但不是。
有没有办法对数据进行编码,使任何字符可以消化为I / O处理?
或者,有没有办法检查我的str是否符合文件IO期望的字符标准,以便替换任何违反它的数据? p>
您的问题是在Windows系统上打开文本模式,默认为区域代码页, cp1252
,一个ASCII超集只能编码Unicode范围的一小部分。
要修复,提供可以支持整个Unicode的更全面的编码范围; 打开
接受一个关键字参数来覆盖默认编码,所以简单的更改:
code> expFile = open(expFilePath,'w')
to
expFile = open(expFilePath,'w',encoding ='utf-8')
pre>
根据您的需要,我会选择
utf-8
或utf -16
;前者对于大多数ASCII文本而言更加紧凑,并且在任何地方都是常见的,而后者与Microsoft的典型编码相匹配,用于存储便携式(非区域设置相关)文本,因此可能有一些特定于Windows的文本编辑器会识别/处理它更容易。I get a
UnicodeEncodeError
writing text with a special character to a file:File "D:\SOFT\Python3\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 956: character maps to <undefined>
My code:
expFile = open(expFilePath, 'w') # data var is what contains a special char expFile.write("\n\n" + data)
The data is probably some weird character from something like Microsoft Word that got pasted into the application's HTML form and it got persisted, now I am importing it. I can't even see it, shows as a diamond in my DB editor when I query it. It just has a placeholder in the text editor. The input should be more rigorously checked for character set compliance but it is not.
Is there a way to encode the data to make any character digestable for I/O processing?
Alternatively, is there a way to check whether my str is compliant to the character standard expected by file IO in order to do replacements of any data that violates it?
解决方案Your problem is that opening in text mode on your Windows system defaulted to the locale code page,
cp1252
, an ASCII superset that only encodes a tiny fraction of the Unicode range.To fix, supply a more comprehensive encoding that can support the whole Unicode range;
open
accepts a keyword argument to override the default encoding, so it's as simple as changing:expFile = open(expFilePath, 'w')
to
expFile = open(expFilePath, 'w', encoding='utf-8')
Depending on your needs, I'd choose either
utf-8
orutf-16
; the former is more compact for mostly ASCII text, and is commonly seen everywhere, while the latter matches Microsoft's typical encoding for storing portable (non-locale dependent) text, so it's possible a few Windows-specific text editors would recognize it/handle it more easily.这篇关于UnicodeEncodeError将特殊字符文字写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!