问题描述
我有一个监视日志文件的python 3程序.该日志除其他外包括用户编写的聊天消息.该日志是由我无法更改的第三方应用程序创建的.
I have a python 3 program that monitors a log file. The log includes, among other things, chat messages written by users. The log is created by a third party application which I cannot change.
今天,用户写了텋 텋 ",它导致程序崩溃,并出现以下错误:
Today a user wrote "텋��텋��" and it caused the program to crash with the following error:
future: <Task finished coro=<updateConsoleLog() done, defined at /usr/local/src/bserver/logmonitor.py:48> exception=UnicodeDecodeError('utf-8',...
say "\xed\xa0\xbd\xed\xb1\x8c"\r\n', 7623, 7624, 'invalid continuation byte')>
Traceback (most recent call last):
File "/usr/lib/python3.4/asyncio/tasks.py", line 238, in _step
result = next(coro)
File "/usr/local/src/bserver/logmonitor.py", line 50, in updateConsoleLog
server_events = self.console.getUpdate()
File "/usr/local/src/bserver/console.py", line 79, in getUpdate
return self.read()
File "/usr/local/src/bserver/console.py", line 90, in read
for line in itertools.islice(log_file, log_no, None):
File "/usr/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7623: invalid continuation byte
ERROR:asyncio:Task exception was never retrieved
使用'file -i log.file',我确定该日志文件是us-ascii.这不应该是问题,因为ascii是utf-8的子集(据我所知).
Using 'file -i log.file' I determined that the log file is us-ascii. This shouldn't be and issue as ascii is a subset of utf-8 (as far as I know).
由于这种情况很少发生,而且我不介意丢失此用户键入的内容,因此我有可能忽略此行或无法解码的特定字符,而继续阅读其余内容吗?文件?
Since this happens rarely and I don't mind losing whatever it is that this user typed, is it possible for me to ignore this line or the particular characters that can't be decoded and just keep on reading the rest of the file?
我考虑使用try: ... except UnicodeDecodeError as ...
,但这意味着错误发生后我无法读取日志文件中的任何内容.
I considered using try: ... except UnicodeDecodeError as ...
, but this would mean I can't read anything in the log file after the error.
代码
def read(self):
log_no = self.last_log_no
log_file = open(self.path, 'r')
server_events = []
starting_log_no = log_no
for line in itertools.islice(log_file, log_no, None): //ERROR
server_events.append(line)
print(line.replace('\n', '').replace('\r', ''))
log_no += 1
self.last_log_no = log_no
if (starting_log_no < log_no):
return server_events
return False
任何帮助或建议将不胜感激!
Any help or advise would be appreciated!
推荐答案
字节字符串\xed\xa0\xbd\xed\xb1\x8c
无效utf-8
. us-ascii
也不是,因为us-ascii
只能是7位长.即\x8c
大于127.
The byte string \xed\xa0\xbd\xed\xb1\x8c
is not utf-8
valid. Neither is it us-ascii
, since us-ascii
can only be 7-bits long; i.e. \x8c
is greater than 127.
而不是忽略UnicodeDecodeError
,请尝试使用支持字节的所有8位(例如latin-1
)的编码打开文件:
Instead of ignoring the UnicodeDecodeError
, try opening the file with an encoding that supports all 8-bits of a byte (e.g. latin-1
):
log_file = open(self.path, 'r' encoding='latin-1')
这篇关于尽管存在UnicodeDecodeError,Python 3 itertools.islice仍继续的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!