python - encodings.utf_8.StreamReader readline()，read()和seek()不配合

考虑这个非常简单的示例。

import codecs
from io import BytesIO

string = b"""# test comment
Some line without comment
# another comment
"""

reader = codecs.getreader("UTF-8")
stream = reader(BytesIO(string))

lines = []
while True:
    # get current position
    position = stream.tell()

    # read first character
    char = stream.read(1)

    # return cursor to start
    stream.seek(position, 0)

    # end of stream
    if char == "":
        break

    # line is not comment
    if char != "#":
        lines.append(stream.readline())
        continue

    # line is comment. Skip it.
    stream.readline()

print(lines)
assert lines == ["Some line without comment\n"]

我正在尝试从StreamReader逐行读取，如果该行以#开头，则将其跳过，否则会将其存储在列表中。但是，当我使用seek()方法时，会有一些奇怪的行为。似乎seek()和readline()不合作，将光标移到很远的地方。结果列表为空。

我当然可以用不同的方式来做。但是，正如我上面所写的那样，这是一个非常简单的示例，它可以帮助我理解事情是如何协同工作的。

我使用Python 3.5。

最佳答案

您不想使用codecs流阅读器。由于被io module取代，这是一种更健壮和更快的实现，在实现分层I / O来处理文本的编码和解码方面，它们是一种过时的，过时的尝试。已有serious calls for the stream readers to be deprecated。

您真的想用codecs.getreader() object代替io.TextIOWrapper()的使用：

import codecs
from io import BytesIO, TextIOWrapper

string = b"""# test comment
Some line without comment
# another comment
"""

stream = TextIOWrapper(BytesIO(string))

此时while循环起作用，并且lines最终成为['Some line without comment\n']。

您也不需要在此处使用seek或tell()。您可以直接在文件对象（包括TextIOWrapper()对象）上循环：

lines = []
for line in stream:
    if not line.startswith('#'):
        lines.append(line)

甚至：

lines = [l for l in stream if not l.startswith('#')]

如果您担心TextIOWrapper()包装器对象在不再需要包装器时会关闭基础流，则只需先分离包装器即可：

stream.detach()

关于python - encodings.utf_8.StreamReader readline()，read()和seek()不配合，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/54349150/