本文介绍了Python-使用utf-8编码读取和写入csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取一个csv文件,其头文件中包含外来字符,与此相关的问题很多.

I'm trying to read a csv file the its header contains foreign characters and I'm having a lot of problems with this.

首先,我正在使用简单的csv.reader读取文件

first of all, I'm reading the file with a simple csv.reader

filename = 'C:\\Users\\yuval\\Desktop\\בית ספר\\עבודג\\new\\resources\\mk'+ str(mkNum) + 'Data.csv'
raw_data = open(filename, 'rt', encoding="utf8")
reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE)
x = list(reader)
header = x[0]
data = np.array(x[1:]).astype('float')

var标头应该是包含文件标头的数组,但是它返回给我的数组是

The var header should be an array that contains the file headers, but the array it returns to me is

['\ufeff"dayPart"', '"length"', '"ifPhoto"', '"ifVideo"', '"ifAlbum"', '"לא"', '"הוא"', '"בכל"', '"אותם"', '"זה"', '"הם"', '"כדי"', '"את"', '"יש"', '"לי"', '"היא"', '"אני"', '"רק"', '"להם"', '"על"', '"עם"', '"של"', '"המדינה"', '"כל"', '"גם"', '"הזה"', '"אם"', '"ישראל"', '"לכל"', '"מי"', '"ל"', '"אמסלם"', '"לנו"', '"אבל"', '"זו"', '"אין"', '"שבת"', '"שלום"', '"כ"', '"שלנו"', '"היום"', '"ומבורך"', '"ח"', '"דודי"', '"ר"', '"הפנים"', '"מה"', '"כי"', '"ה"', '"אחד"', '"ולא"', '"יותר"']

,我不知道为什么在第一个对象中加\ ufeff并用双引号引起来.

and I don't know why it adds the \ufeff in the first object and double quotation marks.

在那之后,我需要写入另一个csv文件,并在标头中也使用外来字符.我试图这样做,但是它把字符写成怪异的符号.

After that, I need to write to another csv file and use foreign characters in the header as well. I was trying to do this like that, but it wrote the characters as weird symbols.

with open('C:\\Users\\yuval\\Desktop\\בית ספר\\עבודג\\new\\variance reduction 1\\mk'+ str(mkNum) + 'Data.csv', 'w', newline='', encoding='utf8') as csvFile:
    csvWriter = csv.writer(csvFile, delimiter=',')
    csvWriter.writerow(newHeader)

有人知道如何解决此问题并在CSV文件标题中使用utf8编码吗?

Does any one know how to fix this problem and work with utf8 encoding in the csv file's header?

推荐答案

您报告了三个独立的问题.这是一个蓝色的猜测,因为没有足够的信息来确保确定,但是您应该尝试以下操作:

You report three separate problems.This is a bit of a guess into the blue, because there's not enough information to be sure, but you should try the following:

  1. 输入编码:如注释中所建议,请尝试"utf-8-sig".这将从您的输入中删除字节顺序标记(BOM).

双引号:在 csv 参数中,您指定 quoting = csv.QUOTE_NONE .这告诉 csv 库CSV表是在不使用引号的情况下编写的(用于转义字符,否则可能会误认为字段或行分隔符).但是,这显然是不正确的,因为输入引用了每个字段.尝试使用 csv.QUOTE_MINIMAL (默认设置)或 csv.QUOTE_ALL .

double quotes: Among the csv parameters, you specify quoting=csv.QUOTE_NONE. This tells the csv library that the CSV table was written without using quotes (for escaping characters that could otherwise be mistaken for field or row separators). However, this is apparently not true, since the input has quotes around each field. Try csv.QUOTE_MINIMAL (the default) or csv.QUOTE_ALL instead.

输出编码:您说输出包含怪异的符号".我怀疑输出实际上是正确的,但是您使用的是默认情况下无法正确显示UTF-8文本的工具:许多Windows应用程序(例如Excel)仍然更喜欢UTF-16和本地化的8位编码,例如 CP-1255 .像问题1一样,您应该尝试使用编解码器"utf-8-sig":许多查看者/编辑者都将BOM表理解为编码提示.

output encoding: You say the output contains "weird symbols". I suspect that the output is actually alright, but you are using a tool which doesn't properly display UTF-8 text by default: many Windows applications (such as Excel) still prefer UTF-16 and localised 8-bit encodings like CP-1255. Like for problem 1, you should try the codec "utf-8-sig": the BOM is understood as an encoding hint by many viewers/editors.

这篇关于Python-使用utf-8编码读取和写入csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 05:04