问题描述
类似于
QDir和QDirIterator会忽略具有非ASCII文件名的文件
和
UnicodeEncodeError:"latin-1"编解码器无法对字符进行编码
关于上面的第二个链接,我在下面添加了test0().我的理解是utf-8是我要寻找的解决方案,但是a尝试对文件名进行编码失败.
With regard to the second link above, I added test0() below. My understanding was that utf-8 was the solution I was searching for, but alas trying to encode the filename fails.
def test0():
print("test0...using unicode literal")
name = u"123c\udcb4.wav"
test("test0b", name)
n = name.encode('utf-8')
print(n)
n = QtCore.QFile.decodeName(n)
print(n)
# From http://docs.python.org/release/3.0.1/howto/unicode.html
# This will indeed overwrite the correct file!
# f = open(name, 'w')
# f.write('blah\n')
# f.close()
Test0结果...
Test0 results...
test0...using unicode literal
test0b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test0b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test0b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test0b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
Traceback (most recent call last):
File "unicode.py", line 157, in <module>
test0()
File "unicode.py", line 42, in test0
n = name.encode('utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed
编辑
进一步阅读 http://tools.ietf.org/html/rfc3629 我说"UTF-8的定义禁止在 U + D800和U + DFFF".因此,如果uft-8不允许使用这些字符.您应该如何处理这样命名的文件?Python可以为它们创建并测试它们的存在.因此,这为我指出了我的Qt api使用情况或Qt api本身有问题吗?!
Further reading from http://tools.ietf.org/html/rfc3629 tells me that "The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF". So if uft-8 doesn't allow these characters. How are you supposed to deal with a file that is so named? Python can create and test existence for them. So this points me at an issue with my Qt api usage or the Qt api itself?!
我正在竭尽全力地正确处理Python3中的unicode文件名.最终,我正在开发一个基于Phonon的音乐播放器.我试图尽可能地将问题隔离开来.从下面的代码中,您将看到我尝试了尽可能多的替代方法.我最初的反应是这里有错误....也许是我的...也许在一个或多个库中.任何帮助将不胜感激!
I am struggling to wrap my head around proper handling of unicode file name in Python3. Ultimately, I'm working on a Phonon based music player. I've tried to isolate the problem(s) from that as much as possible. From the code below you will see that I've tried as many alternatives as I can find. My initial response is that there are bugs here....maybe mine...maybe in one or more libraries. Any help would be much appreciated!
我有一个包含3个Unicode文件名123 [abc] U.wav的目录.前两个文件已正确处理...大部分...第三个123c只是错误.
I have a directory with 3 unicode file names 123[abc]U.wav. The first 2 files are handled properly...mostly...the third one 123c is just wrong.
from PyQt4 import QtGui, QtCore
import sys, os
def test(_name, _file):
# print(_name, repr(_file))
f = QtCore.QFile(_file)
# f = QtCore.QFile(QtCore.QFile.decodeName(test))
exists = f.exists()
try:
print(_name, "QFile.exists", f.fileName(), exists)
except UnicodeEncodeError as e:
print(e, repr(_file), exists)
fileInfo = QtCore.QFileInfo(_file)
exists = fileInfo.exists()
try:
print(_name, "QFileInfo.exists", fileInfo.fileName(), exists)
except UnicodeEncodeError as e:
print(e, repr(_file), exists)
exists = os.path.exists(_file)
try:
print(_name, "os.path.exists", _file, exists)
except UnicodeEncodeError as e:
print(e, repr(_file), exists)
exists = os.path.isfile(_file)
try:
print(_name, "os.path.isfile", _file, exists)
except UnicodeEncodeError as e:
print(e, repr(_file), exists)
print()
def test1():
args = QtGui.QApplication.arguments()
print("test1...using QtGui.QApplication.arguments()")
test("test1", args[1])
def test2():
print("test2...using sys.argv")
test("test2", sys.argv[1])
def test3():
print("test3...QtGui.QFileDialog.getOpenFileName()")
name = QtGui.QFileDialog.getOpenFileName()
test("test3", name)
def test4():
print("test4...QtCore.QDir().entryInfoList()")
p = os.path.abspath(__file__)
p, _ = os.path.split(p)
d = QtCore.QDir(p)
for inf in d.entryInfoList(QtCore.QDir.AllEntries|QtCore.QDir.NoDotAndDotDot|QtCore.QDir.System):
print("test4", inf.fileName())
# if str(inf.fileName()).startswith("123c"):
if u"123c\ufffd.wav" == inf.fileName():
# if u"123c\udcb4.wav" == inf.fileName(): # This check fails..even tho that is what is reported in error messages for test2
test("test4a", inf.fileName())
test("test4b", inf.absoluteFilePath())
def test5():
print("test5...os.listdir()")
p = os.path.abspath(__file__)
p, _ = os.path.split(p)
dirList = os.listdir(p)
for file in dirList:
fullfile = os.path.join(p, file)
try:
print("test5", file)
except UnicodeEncodeError as e:
print(e)
print("test5", repr(fullfile))
# if u"123c\ufffd.wav" == file: # This check fails..even tho it worked in test4
if u"123c\udcb4.wav" == file:
test("test5a", file)
test("test5b", fullfile)
print()
def test6():
print("test6...Phonon and QtGui.QFileDialog.getOpenFileName()")
from PyQt4.phonon import Phonon
class Window(QtGui.QDialog):
def __init__(self):
QtGui.QDialog.__init__(self, None)
self.mediaObject = Phonon.MediaObject(self)
self.audioOutput = Phonon.AudioOutput(Phonon.MusicCategory, self)
Phonon.createPath(self.mediaObject, self.audioOutput)
self.mediaObject.stateChanged.connect(self.handleStateChanged)
name = QtGui.QFileDialog.getOpenFileName()# works with python3..not for 123c
# name = QtGui.QApplication.arguments()[1] # works with python2..but not python3...not for 123c
# name = sys.argv[1] # works with python3..but not python2...not for 123c
# p = os.path.abspath(__file__)
# p, _ = os.path.split(p)
# print(p)
# name = os.path.join(p, str(name))
self.mediaObject.setCurrentSource(Phonon.MediaSource(name))
self.mediaObject.play()
def handleStateChanged(self, newstate, oldstate):
if newstate == Phonon.PlayingState:
source = self.mediaObject.currentSource().fileName()
print('test6 playing: :', source)
elif newstate == Phonon.StoppedState:
source = self.mediaObject.currentSource().fileName()
print('test6 stopped: :', source)
elif newstate == Phonon.ErrorState:
source = self.mediaObject.currentSource().fileName()
print('test6 ERROR: could not play:', source)
win = Window()
win.resize(200, 100)
# win.show()
win.exec_()
def timerTick():
QtGui.QApplication.exit()
if __name__ == '__main__':
app = QtGui.QApplication(sys.argv)
app.setApplicationName('unicode_test')
test1()
test2()
test3()
test4()
test5()
test6()
timer = QtCore.QTimer()
timer.timeout.connect(timerTick)
timer.start(1)
sys.exit(app.exec_())
使用123a的测试结果...
Test results with 123a...
python3 unicode.py 123a�.wav
test1...using QtGui.QApplication.arguments()
test1 QFile.exists unknown False
test1 QFileInfo.exists unknown False
test1 os.path.exists unknown False
test1 os.path.isfile unknown False
test2...using sys.argv
test2 QFile.exists 123a�.wav True
test2 QFileInfo.exists 123a�.wav True
test2 os.path.exists 123a�.wav True
test2 os.path.isfile 123a�.wav True
test3...QtGui.QFileDialog.getOpenFileName()
test3 QFile.exists /home/mememe/Desktop/test/unicode/123a�.wav True
test3 QFileInfo.exists 123a�.wav True
test3 os.path.exists /home/mememe/Desktop/test/unicode/123a�.wav True
test3 os.path.isfile /home/mememe/Desktop/test/unicode/123a�.wav True
test4...QtCore.QDir().entryInfoList()
test4 123a�.wav
test4 123bÆ.wav
test4 123c�.wav
test4a QFile.exists 123c�.wav False
test4a QFileInfo.exists 123c�.wav False
test4a os.path.exists 123c�.wav False
test4a os.path.isfile 123c�.wav False
test4b QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b QFileInfo.exists 123c�.wav False
test4b os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False
test4 unicode.py
test5...os.listdir()
test5 unicode.py
test5 '/home/mememe/Desktop/test/unicode/unicode.py'
test5 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed
test5 '/home/mememe/Desktop/test/unicode/123c\udcb4.wav'
test5a QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5a os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5 123bÆ.wav
test5 '/home/mememe/Desktop/test/unicode/123bÆ.wav'
test5 123a�.wav
test5 '/home/mememe/Desktop/test/unicode/123a�.wav'
test6...Phonon and QtGui.QFileDialog.getOpenFileName()
test6 stopped: : /home/mememe/Desktop/test/unicode/123a�.wav
test6 playing: : /home/mememe/Desktop/test/unicode/123a�.wav
test6 stopped: : /home/mememe/Desktop/test/unicode/123a�.wav
使用123b的测试结果...
Test results with 123b...
python3 unicode.py 123bÆ.wav
test1...using QtGui.QApplication.arguments()
test1 QFile.exists 123b.wav False
test1 QFileInfo.exists 123b.wav False
test1 os.path.exists 123b.wav False
test1 os.path.isfile 123b.wav False
test2...using sys.argv
test2 QFile.exists 123bÆ.wav True
test2 QFileInfo.exists 123bÆ.wav True
test2 os.path.exists 123bÆ.wav True
test2 os.path.isfile 123bÆ.wav True
test3...QtGui.QFileDialog.getOpenFileName()
test3 QFile.exists /home/mememe/Desktop/test/unicode/123bÆ.wav True
test3 QFileInfo.exists 123bÆ.wav True
test3 os.path.exists /home/mememe/Desktop/test/unicode/123bÆ.wav True
test3 os.path.isfile /home/mememe/Desktop/test/unicode/123bÆ.wav True
test4...QtCore.QDir().entryInfoList()
test4 123a�.wav
test4 123bÆ.wav
test4 123c�.wav
test4a QFile.exists 123c�.wav False
test4a QFileInfo.exists 123c�.wav False
test4a os.path.exists 123c�.wav False
test4a os.path.isfile 123c�.wav False
test4b QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b QFileInfo.exists 123c�.wav False
test4b os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False
test4 unicode.py
test5...os.listdir()
test5 unicode.py
test5 '/home/mememe/Desktop/test/unicode/unicode.py'
test5 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed
test5 '/home/mememe/Desktop/test/unicode/123c\udcb4.wav'
test5a QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5a os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5 123bÆ.wav
test5 '/home/mememe/Desktop/test/unicode/123bÆ.wav'
test5 123a�.wav
test5 '/home/mememe/Desktop/test/unicode/123a�.wav'
test6...Phonon and QtGui.QFileDialog.getOpenFileName()
test6 stopped: : /home/mememe/Desktop/test/unicode/123bÆ.wav
test6 playing: : /home/mememe/Desktop/test/unicode/123bÆ.wav
test6 stopped: : /home/mememe/Desktop/test/unicode/123bÆ.wav
使用123c的测试结果...
Test results with 123c...
python3 unicode.py 123c�.wav
test1...using QtGui.QApplication.arguments()
test1 QFile.exists unknown False
test1 QFileInfo.exists unknown False
test1 os.path.exists unknown False
test1 os.path.isfile unknown False
test2...using sys.argv
test2 QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test2 QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test2 os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test2 os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test3...QtGui.QFileDialog.getOpenFileName()
test3 QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test3 QFileInfo.exists 123c�.wav False
test3 os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test3 os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False
test4...QtCore.QDir().entryInfoList()
test4 123a�.wav
test4 123bÆ.wav
test4 123c�.wav
test4a QFile.exists 123c�.wav False
test4a QFileInfo.exists 123c�.wav False
test4a os.path.exists 123c�.wav False
test4a os.path.isfile 123c�.wav False
test4b QFile.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b QFileInfo.exists 123c�.wav False
test4b os.path.exists /home/mememe/Desktop/test/unicode/123c�.wav False
test4b os.path.isfile /home/mememe/Desktop/test/unicode/123c�.wav False
test4 unicode.py
test5...os.listdir()
test5 unicode.py
test5 '/home/mememe/Desktop/test/unicode/unicode.py'
test5 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed
test5 '/home/mememe/Desktop/test/unicode/123c\udcb4.wav'
test5a QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' False
test5a os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5a os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '123c\udcb4.wav' True
test5b QFile.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b QFileInfo.exists 'utf-8' codec can't encode character '\udcb4' in position 4: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' False
test5b os.path.exists 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5b os.path.isfile 'utf-8' codec can't encode character '\udcb4' in position 38: surrogates not allowed '/home/mememe/Desktop/test/unicode/123c\udcb4.wav' True
test5 123bÆ.wav
test5 '/home/mememe/Desktop/test/unicode/123bÆ.wav'
test5 123a�.wav
test5 '/home/mememe/Desktop/test/unicode/123a�.wav'
test6...Phonon and QtGui.QFileDialog.getOpenFileName()
test6 stopped: : /home/mememe/Desktop/test/unicode/123c�.wav
有关测试结果的有趣提示...
Interesting things to note about the test results...
- 对于所有3个文件,Test1均失败.
- 通过了所有3个文件的Test2 ...除了针对123c的QFile和QFileInfo测试之外
- Test3通过了123a和123b,但失败了123c
- Test4 ... QDir在目录中找到所有4个文件
- 所有文件的Test4a和Test4b失败
- Test5 ... os.listdir在目录中找到所有4个文件
- 注意:Test5a和test5b检查必须使用不同的unicode检查吗?!
- Test1 failed for all 3 files.
- Test2 passed for all 3 files...except for the QFile and QFileInfo tests for 123c
- Test3 passed for 123a and 123b but failed for 123c
- Test4 ...QDir found all 4 files in the directory
- Test4a and Test4b failed for all files
- Test5 ...os.listdir found all 4 files in the directory
- NOTE: The Test5a and test5b checks had to use a different unicode check?!
我知道这是很多信息...我没有想做透彻的了解.
I know that is a lot of information...I wast trying to be thorough.
因此,如果还有最后一个问题,那么在Python3中处理Unicode文件名的正确方法是什么?
So, if there is one final question is what is the right way to deal with unicode file names in Python3?
推荐答案
您是对的,
123c
是错误的.证据表明磁盘上的文件名包含无效的Unicode代码点U + DCB4 .当Python尝试打印该字符时,它会正确地抱怨它不能打印该字符.当Qt处理test4中的字符时,它也不能处理它,但是没有抛出错误,而是将其转换为 Unicode替换字符U + FFFD .显然,新文件名不再与磁盘上的文件名匹配.You're right,
123c
is just wrong. The evidence shows that the filename on disk contains an invalid Unicode codepoint U+DCB4. When Python tries to print that character, it rightly complains that it can't. When Qt processes the character in test4 it can't handle it either, but instead of throwing an error it converts it to the Unicode REPLACEMENT CHARACTER U+FFFD. Obviously the new filename no longer matches what's on disk.如果您自己进行转换并指定适当的错误处理,Python也可以在字符串中使用替换字符而不是引发错误.我手头没有Python 3进行测试,但我认为它将起作用:
Python can also use the replacement character in a string instead of throwing an error if you do the conversion yourself and specify the proper error handling. I don't have Python 3 on hand to test this but I think it will work:
filename = filename.encode('utf-8').decode('utf-8', 'replace')
这篇关于Python3 Qt Unicode文件名问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!