问题描述
我正在尝试使用Python读取CSV文本文件(根据Notepad ++ 而不使用BOM的UTF-8)。然而,编码似乎有问题:
print(open(path,encoding =utf-8)。 )
这个小字符似乎是问题:●
(完整字符串:●•อีเปียขี้如果我尝试使用UTF-16,那么就有一个消息:
#also尝试使用encode
print(open(path,encoding =utf-16)。read() 8'))
即使我尝试用自动编解码器查找器打开它,我收到错误。 csv.reader中的行
(codecs.iterencode(codecs.iterdecode)) (f,encoding),utf-8)):
yield [e.decode(utf-8)for e in row]
我俯视什么?该文件包含Twitter文本,其中包含大量不同的字符。但是,Python只能阅读/打印文件,这不是一件难事吗?
编辑:
刚刚尝试使用此答案的代码:
import csv
with open('source.csv',newline ='',encoding ='utf-8')as f:
reader = csv.reader(f)
for reader in reader:
print(row)
至少打印一些行到屏幕,但它也会在一些行后引发错误:
似乎自动使用
CP850
这是另一个编码...我无法理解这一切....解决方案你的python的版本是什么?
如果使用2.x,尝试在脚本开头粘贴导入:从__future__导入unicode_literals
比尝试:
print(open(path).read()。encode('utf-8')
还有一个很好的字符集检测工具:。
我希望它会帮助你。I'm trying read a CSV textfile (UTF-8 without BOM according to Notepad++) using Python. However there seems to be a problem with encoding:
print(open(path, encoding="utf-8").read())This little character seems to be the problem:
●
(full string: "●• อีเปียขี้บ่น ت •●"), however I'm sure there will be more.If I try UTF-16, then there is a message:
#also tried with encode print(open(path, encoding="utf-16").read().encode('utf-8'))Even when I try opening it with an automatic codec finder I receive the error.
def csv_unireader(f, encoding="utf-8"): for row in csv.reader(codecs.iterencode(codecs.iterdecode(f, encoding), "utf-8")): yield [e.decode("utf-8") for e in row]What am I overlooking? The file contains Twitter texts which contain a lot of different characters that's for sure. But this can't be such difficult task in Python, just reading/printing a file?
Edit:
Just tried using the code from this answer: https://stackoverflow.com/a/14786752/45311
import csv with open('source.csv', newline='', encoding='utf-8') as f: reader = csv.reader(f) for row in reader: print(row)This at least prints some rows to the screen, but it also throws an error after some rows:
It seems to automatically use
CP850
which is another encoding... I can't make sense out of all this....解决方案What is the version of your python?If use the 2.x try to paste the import at the beginning of your script:
from __future__ import unicode_literals
than try:
print(open(path).read().encode('utf-8'))
There is also a great tool for charset detections: chardet.I hope it'll help you.
这篇关于Python打开CSV文件,假设混合编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!