python - 使用 python-future 解码 Python 2 `tempfile`

我正在尝试编写一个与 Python 2/3 兼容的例程来获取 CSV 文件，将其从 latin_1 解码为 Unicode，并以稳健、可扩展的方式将其提供给 csv.DictReader。

对于 Python 2/3 支持，我使用 python-future 包括从 open 导入 ojit_code ，并导入 builtins 以获得一致的行为

我希望通过使用 unicode_literals

溢出到磁盘来处理特别大的文件

在输入 tempfile.SpooledTemporaryFile

之前，我正在使用 io.TextIOWrapper 处理来自 latin_1 编码的解码

这一切在 Python 3 下都可以正常工作。

问题是 DictReader 期望包装一个符合 TextIOWrapper 的流。不幸的是，在 Python 2 下，虽然我已经导入了 Python 3 风格的 BufferedIOBase ，但原版 Python 2 open 仍然当然返回 Python 2 tempfile.SpooledTemporaryFile ，而不是 cStringIO.StringO 要求的 Python 3 io.BytesIO 。

我可以想到这些可能的方法:

将 Python 2 TextIOWrapper 包装为 Python 3 风格的 cStringIO.StringO 。我不确定如何解决这个问题 - 我需要编写这样的包装器还是已经存在？

找到一个 Python 2 替代方案来包装 io.BytesIO 流以进行解码。我还没有找到。

去掉 cStringIO.StringO ，完全在内存中解码。 CSV 文件需要多大才能完全在内存中运行才能成为问题？

取消 SpooledTemporaryFile ，并实现我自己的溢出到磁盘。这将允许我从 python-future 调用 SpooledTemporaryFile，但我宁愿不这样做，因为它会非常乏味而且可能不太安全。

最好的前进方向是什么？我错过了什么吗？

进口:

from __future__ import (absolute_import, division,
                    print_function, unicode_literals)
from builtins import (ascii, bytes, chr, dict, filter, hex, input,  # noqa
                  int, map, next, oct, open, pow, range, round,  # noqa
                  str, super, zip)  # noqa
import csv
import tempfile
from io import TextIOWrapper
import requests

在里面:

...
self._session = requests.Session()
...

常规:

def _fetch_csv(self, path):
    raw_file = tempfile.SpooledTemporaryFile(
        max_size=self._config.get('spool_size')
    )
    csv_r = self._session.get(self.url + path)
    for chunk in csv_r.iter_content():
        raw_file.write(chunk)
    raw_file.seek(0)
    text_file = TextIOWrapper(raw_file._file, encoding='latin_1')
    return csv.DictReader(text_file)

错误:

...in _fetch_csv
    text_file = TextIOWrapper(raw_file._file, encoding='utf-8')
AttributeError: 'cStringIO.StringO' object has no attribute 'readable'

最佳答案

不确定这是否有用。这种情况与你的情况只是模糊地相似。

我想使用 NamedTemporaryFile 创建一个 CSV 以 UTF-8 编码并具有操作系统 native 行尾，可能不完全是 standard ，但可以通过使用 Python 3 样式 io.open 轻松适应。

难点在于 Python 2 中的 NamedTemporaryFile 打开一个字节流，导致 problems with line endings 。我确定的解决方案是创建临时文件，然后关闭它并使用 io.open 重新打开，我认为它比 Python 2 和 3 的单独案例要好一些。最后一部分是优秀的 backports.csv 库，它在 Python 2 中提供了 Python 3 风格的 CSV 处理。

from __future__ import absolute_import, division, print_function, unicode_literals
from builtins import str
import csv, tempfile, io, os
from backports import csv

data = [["1", "1", "John Coltrane",  1926],
        ["2", "1", "Miles Davis",    1926],
        ["3", "1", "Bill Evans",     1929],
        ["4", "1", "Paul Chambers",  1935],
        ["5", "1", "Scott LaFaro",   1936],
        ["6", "1", "Sonny Rollins",  1930],
        ["7", "1", "Kenny Burrel",   1931]]

## create CSV file
with tempfile.NamedTemporaryFile(delete=False) as temp:
    filename = temp.name

with io.open(filename, mode='w', encoding="utf-8", newline='') as temp:
    writer = csv.writer(temp, quoting=csv.QUOTE_NONNUMERIC, lineterminator=str(os.linesep))
    headers = ['X', 'Y', 'Name', 'Born']
    writer.writerow(headers)
    for row in data:
        print(row)
        writer.writerow(row)

关于python - 使用 python-future 解码 Python 2 `tempfile`，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/34823113/