未在Python 3中显式打开文件时处理从字节到字符串的转换

本文介绍了未在Python 3中显式打开文件时处理从字节到字符串的转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Requests模块授权然后从Web API中提取csv内容，并使其在Python 2.7中正常运行.我现在想在Python 3.5中编写相同的脚本，但是遇到一些问题:

I am using the Requests module to authorise and then pull csv content from a web API and have it running fine in Python 2.7. I now want to write the same script in Python 3.5 but experiencing some issues:

"iterator should return strings, not bytes (did you open the file in text mode?)"

requests.get似乎返回字节而不是字符串，这似乎与移至Python 3.x时看到的编码问题有关.从最后一行:next(reader)的第3行出现错误.在Python 2.7中，这不是问题，因为csv函数是在'wb'模式下处理的.

The requests.get seems to return bytes and not a string, which seems to be related to the encoding issues seen when moving to Python 3.x. The error is raised on the 3rd from last line: next(reader). In Python 2.7 this was not an issue because the csv functions were handled in 'wb' mode.

本文非常相似，但是由于我没有直接打开csv文件，因此我似乎无法强制以这种方式对响应文本进行编码: csv.错误:迭代器应返回字符串，而不是字节

This article is very similar, but as I'm not opening a csv file directly, I cant seem to force the response text to be encoded this way:csv.Error: iterator should return strings, not bytes

countries = ['UK','US','CA']
datelist = [1,2,3,4]
baseurl = 'https://somewebsite.com/exporttoCSV.php'

#--- For all date/cc combinations
for cc in countries:
    for d in datelist:

        #---Build API String with variables
        url = (baseurl + '?data=chart&output=csv' +
               '&dataset=' + d +
               '&cc=' + cc)

        #---Run API Call and create reader object
        r = requests.get(url, auth=(username, password))
        text = r.iter_lines()
        reader = csv.reader(text,delimiter=',')

        #---Write csv output to csv file with territory and date columns
        with open(cc + '_'+ d +'.csv','wt', newline='') as file:
            a = csv.writer(file)
            a.writerow(['position','id','title','kind','peers','territory','date']) #---Write header line
            next(reader) #---Skip original headers
            for i in reader:
                a.writerow(i +[countrydict[cc]] + [datevalue])

推荐答案

在无法测试您的确切方案的情况下，我认为应该通过将text = r.iter_lines()更改为:

Without being able to test your exact scenario, I believe this should be solved by changing text = r.iter_lines() to:

text = [line.decode('utf-8') for line in r.iter_lines()]

这应该将r.iter_lines()读入的每一行从字节字符串解码为csv.reader可用的字符串

This should decode each line read in by r.iter_lines() from a byte string to a string usable by csv.reader

我的测试用例如下:

>>> iter_lines = [b'1,2,3,4',b'2,3,4,5',b'3,4,5,6']
>>> text = [line.decode('utf-8') for line in iter_lines]
>>> text
['1,2,3,4', '2,3,4,5', '3,4,5,6']
>>> reader = csv.reader(text,delimiter=',')
>>> next(reader)
['1', '2', '3', '4']
>>> for i in reader:
...     print(i)
...
['2', '3', '4', '5']
['3', '4', '5', '6']

这篇关于未在Python 3中显式打开文件时处理从字节到字符串的转换的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！