问题描述
我一直在试图弄清楚如何获取我下载到DataFrame中的UTF-8 CSV.到目前为止,我已经尝试过
I have been trying to figure out how to get a UTF-8 CSV that I downloaded into a DataFrame. So far I have tried
df = pd.read_csv('myfile.csv', encoding='utf8')
它给了我垃圾.我已经成功地使用
and it gives me garbage. I am having success reading it in with
import csv
with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
如这篇文章中所建议的
但是它读取了这个巨大的文件,而我无法将其放入DataFrame中.
but it reads in this gigantic file and I cannot get it into a DataFrame.
我正在使用python3.感谢您的帮助!
I'm using python 3. Thanks for helping!
我的具体错误输出是
UnicodeDecodeError:'utf-8'编解码器无法解码位置3的字节0xa0:无效的起始字节'
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 3: invalid start byte'
我要使用的文件是从此链接下载的CSV年度文件之一(不是每周,我不确定每周的格式是否不同)
And the file I am trying to work is one of the YEARLY CSV files downloaded from this link (not WEEKLY, I am not sure if weekly is a different format)
https://exporter.nih.gov/ExPORTER_Catalog.aspx ?sid = 2& index = 0
推荐答案
由于此问题的帖子,我对其进行了修复
I fixed it thanks to the post at this question
'utf- 8'编解码器无法解码位置18的字节0x92:无效的起始字节
我想我会尝试他们建议的解决方法
I thought I would try the fix that they suggested
df = pd.read_csv('myfile.csv', encoding='cp1252')
成功了!这是Windows代码页1252 ...不是utf-8
and it worked! It's Windows codepage 1252... not utf-8
这篇关于将utf-8 CSV文件读入数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!