问题描述
我是python的新手,在理解unicode时遇到问题.我正在使用Python 3.4.我花了整整一天的时间来尝试通过阅读有关Unicode的信息来解决这个问题,其中包括 http://www.fileformat.info/info/unicode/char/201C/index.htm 和 http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html
I'm a new to python and am having problems understand unicode. I'm usingPython 3.4.I've spent an entire day trying to figure this out by reading about unicode including http://www.fileformat.info/info/unicode/char/201C/index.htm andhttp://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html
我需要引用特殊引号,因为它们在我正在分析的文本中使用.我确实测试了W7命令窗口可以读取和写入2个特殊引号字符.为简单起见,我编写了一个单行脚本:
I need to refer to special quotes because they are used in the text I'm analyzing. I did test that the W7 command window can read and write the 2 special quote characters.To make things simple, I wrote a one line script:
print ('"') # that's the special quote mark in between normal single quotes
并获得以下输出:
Traceback (most recent call last):
File "C:\Users\David\Documents\Python34\Scripts\wordCount3.py", line 1, in <module>
print ('\u201c')
File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position 0: character maps to <undefined>
那我该如何写一些东西来引用这两个字符u201C
和u201D
?
So how do I write something to refer to these two characters u201C
and u201D
?
这是文件打开语句中正确的编码选择吗?
Is this the correct encoding choice in the file open statement?
with open(fileIn, mode='r', encoding='utf-8', errors='replace') as f:
推荐答案
原因是在3.x Python中,您不能只将unicode字符串与字节字符串混合使用.也许,您已经阅读了有关Python 2.x的手册,只要字节字符串包含可转换的字符,便可以进行此类操作.
The reason is that in 3.x Python You can't just mix unicode strings with byte strings. Probably, You've read the manuals dealing with Python 2.x where such things are possible as long as bytestring contains convertable chars.
print('\u201c', '\u201d')
对我来说效果很好,所以唯一的原因是您对源文件或终端使用了错误的编码.
works fine for me, so the only reason is that you're using wrong encoding for source file or terminal.
此外,您还可以通过在源代码的顶部添加下一行来将python明确指向您正在使用的代码页:
Also You may explicitly point python to codepage you're using, by throwing the next line ontop of your source:
# -*- coding: utf-8 -*-
已添加:您似乎正在Windows机器上工作,如果可以,则可以通过运行将控制台代码页更改为utf-8
Added: it seems that You're working on Windows machine, if so you could change Your console codepage to utf-8 by running
chcp 65001
在启动python解释器之前.这些更改将是暂时的,如果需要永久更改,请运行下一个.reg文件:
before You fire up your python interpreter. That changes would be temporary, and if You want permanent, run the next .reg file:
Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Console]
"CodePage"=dword:fde9
这篇关于使用Unicode字符u201c的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!