本文介绍了在Mac上是否存在Python UnicodeDecodeError,但在PC上却没有?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,该脚本基本上将学生的代码文件聚合到一个文件中,以进行窃检测.它遍历一棵文件树,将所有文件内容复制到一个文件中.

我已经在Mac和PC上的完全相同的文件上运行了脚本.在我的PC上,它可以正常工作.在我的Mac上,它遇到27个UnicodeDecodeErrors(可能是我正在测试的所有文件的0.1%).

是什么原因导致Mac上出现UnicodeDecodeError而不是PC上的

如果相关,代码为:

originalFile = open(originalFilename, "r")
newFile = open(newFilename, "a")
newFile.write(originalFile.read())
解决方案

找出保存该文件时使用的编码.一个安全的选择是将文件加载为'utf-8'.如果成功,则可能是正确的编码.

# try utf-8. If this fails, all bets are off.
open(originalFilename, "r", encoding="utf-8")

现在,如果学生正在向您发送这些文件,则很可能他们只是在系统上使用默认编码.无法可靠地猜测编码.如果他们使用的是8位编解码器(例如ISO-8859字符集之一),则几乎不可能猜测使用了哪一个.然后,要做什么取决于您正在处理的文件类型.

I've got a script that basically aggregates students' code files into one file for plagiarism detection. It walks through a tree of files, copying all file contents into one file.

I've run the script on the exact same files on my Mac and my PC. On my PC, it works fine. On my Mac, it encounters 27 UnicodeDecodeErrors (probably 0.1% of all files I'm testing).

What could cause a UnicodeDecodeError on a Mac, but not on a PC?

If relevant, the code is:

originalFile = open(originalFilename, "r")
newFile = open(newFilename, "a")
newFile.write(originalFile.read())
解决方案

Figure out what encoding was used when saving that file. A safe bet is loading the file as 'utf-8'. If that succeeds then it's likely to be the correct encoding.

# try utf-8. If this fails, all bets are off.
open(originalFilename, "r", encoding="utf-8")

Now, if students are sending you these files, it's likely they just use the default encoding on their system. It is not possible to reliably guess the encoding. If they were using an 8-bit codec, like one of the ISO-8859 character sets, it will be almost impossible to guess which one was used. What to do then depends on what kind of files you're processing.

这篇关于在Mac上是否存在Python UnicodeDecodeError,但在PC上却没有?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 06:03