问题描述
我有多个CSV文件,需要循环分析这些文件以收集信息.问题在于,尽管它们是相同的格式,但有些用'\ t'分隔,而另一些用','分隔.之后,我要删除字符串中的双引号.
I have multiple CSV files which I need to parse in a loop to gather information.The problem is that while they are the same format, some are delimited by '\t' and others by ','.After this, I want to remove the double-quote from around the string.
python可以通过多个可能的分隔符进行分割吗?
Can python split via multiple possible delimiters?
目前,我可以使用以下方法将一行一分为二:
At the minute, I can split the line with one by using:
f = open(filename, "r")
fields = f.readlines()
for fs in fields:
sf = fs.split('\t')
tf = [fi.strip ('"') for fi in sf]
推荐答案
像这样分割文件不是一个好主意:如果其中一个字段中有逗号,它将失败.例如(对于制表符分隔的文件):"field1"\t"Hello, world"\t"field3"
行将被分成4个字段,而不是3.
Splitting the file like that is not a good idea: It will fail if there is a comma within one of the fields. For example (for a tab-delimited file): The line "field1"\t"Hello, world"\t"field3"
will be split into 4 fields instead of 3.
相反,您应该使用 csv
模块.它包含有用的 Sniffer
类,它可以检测使用了哪些定界符在文件中. csv模块还将为您删除双引号.
Instead, you should use the csv
module. It contains the helpful Sniffer
class which can detect which delimiters are used in the file. The csv module will also remove the double-quotes for you.
import csv
csvfile = open("example.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
for line in reader:
#process line
这篇关于Python:使用多个分割定界符分割文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!