问题描述
我有一个文本文件(.txt),可以是制表符分隔格式或管道分隔格式,我需要将其转换为CSV文件格式。我使用python 2.6。任何人都可以建议我如何识别文本文件中的分隔符,读取数据,然后将其转换为逗号分隔文件。
I have a text file (.txt) which could be in tab separated format or pipe separated format, and I need to convert it into CSV file format. I am using python 2.6. Can any one suggest me how to identify the delimiter in a text file, read the data and then convert that into comma separated file.
提前感谢
推荐答案
我担心你不知道分隔符,而不知道它是什么。 CSV的问题是,: / p>
I fear that you can't identify the delimiter without knowing what it is. The problem with CSV is, that, quoting ESR:
如果分隔符出现在字段中,则需要以某种方式转义。不知道,如何逃避完成,自动识别是困难的。可以通过UNIX方式进行转义,使用反斜杠'\'或Microsoft方式,使用引号,然后必须转义。这不是一个简单的任务。
The delimiter needs to be escaped in some way if it can appear in fields. Without knowing, how the escaping is done, automatically identifying it is difficult. Escaping could be done the UNIX way, using a backslash '\', or the Microsoft way, using quotes which then must be escaped, too. This is not a trivial task.
所以我的建议是从任何人生成您要转换的文件获得完整的文档。
So my suggestion is to get full documentation from whoever generates the file you want to convert. Then you can use one of the approaches suggested in the other answers or some variant.
编辑:
Python可以使用其中一种方法提供可帮助您推断DSV格式的 。如果您的输入看起来像这样(请注意第二行第一个字段中带引号的分隔符):
Python provides csv.Sniffer that can help you deduce the format of your DSV. If your input looks like this (note the quoted delimiter in the first field of the second row):
a|b|c
"a|b"|c|d
foo|"bar|baz"|qux
您可以这样做:
import csv
csvfile = open("csvfile.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.DictReader(csvfile, dialect=dialect)
for row in reader:
print row,
# => {'a': 'a|b', 'c': 'd', 'b': 'c'} {'a': 'foo', 'c': 'qux', 'b': 'bar|baz'}
# write records using other dialect
这篇关于如何在Python中将制表符分隔,管道分隔为CSV文件格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!