我已经进行了彻底的搜索,试图找到一个没有问题的答案。
为了更好地解释我的问题,我的任务是合并多个.csv文件,同时还要做一些其他事情。例如,假设我有三个名为run_1.csv,run_2.csv和run_3.csv的文件,它们都位于一个名为/ runs /的目录中
run_1.csv看起来像:
Name, Mass (kg), run_1
One, 1, 5.4
Two, 2, 4.5
Three, 3, 6.5
run_2.csv看起来像:
Name, Mass (kg), run_2
One, 1, 5.7
Two, 2, 6.7
和run_3.csv看起来像:
Name, Mass (kg), run_3
One, 1, 4.7
Three, 3, 5.9
Four, 4, 2.0
我希望我的输出文件(output.csv)看起来像这样:(注意,行的顺序无关紧要)
Name, Mass (kg), run_1, run_2, run_3
One, 1, 5.4, 5.7, 4.7
Two, 2, 4.5, 6.7,
Three, 3, 6.5, , 5.9
Four, 4, , , 2.0
目前,我正在使用csv模块,并且已经完成了以下工作:
import os
import csv
fields = ['name', 'mass', 'run_1', 'run_2', 'run_3']
with open('output.csv', 'wb') as csvfile :
writer = csv.writer(csvfile, delimiter=",")
writer.writerow(fields) #write the header
file_names= []
for file in os.listdir(/runs/):
file_names.append(file)
with open(/runs/+file_name+'.csv', 'rb') as infile:
reader = csv.reader(infile)
reader.next() #just skipping the first row, the header
entries = set()
for row in reader:
line = []
key = row[0]
time = row[2]
if key not in entries:
row.remove(row[-1])
line.extend(row)
for number in images_full:
line.append('')
line.insert(fields.index(file_name.strip('.csv')), time)
writer.writerow(line)
elif key in entries:
row.remove(row[-1])
line.extend(row)
for number in images_full:
line.append('')
line.insert(fields.index(file_name.strip('.csv')), time)
writer.writerow(line) #BUT, I only want it too add this data into the missing spot, not overwrite the whole line!
因此,我很茫然,不胜感激。输入的csv文件可以更改,但是我相信有一种方法可以实现。
编辑:这是通过将原始csv读入字典中,然后将其写出后解决的,如下所示:
counter = 0
with open(/result+total_data_file_name, 'wb') as outfile:
writer = csv.writer(outfile)
writer.writerow(fields)
fields.pop(0)
for names in result.keys():
line = []
name = result.keys()[counter]
line.append(name)
for field_key in fields:
try:
line.append(result[name][field_key])
except KeyError:
line.append('')
counter += 1
writer.writerow(line)
最佳答案
这会将标题下的所有值放入字典中,以消除重复,您只需编写标题,然后编写键/值即可。
from collections import defaultdict
new_data_dict = {}
files = ["in.csv","in2.csv","in3.csv"]
for f in files:
with open(f) as f:
f.next()
for row in f:
row = row.strip().split(",")
new_data_dict.setdefault(row[0],set())
new_data_dict[row[0]].update(row[1:])
{'Four': set([' 2.0', ' 4']), 'Three': set([' 3', ' 6.5', ' 5.9']), 'Two': set([' 2', ' 6.7', ' 4.5']), 'One': set([' 5.7', ' 5.4', ' 1', ' 4.7'])}
写入数据:
import csv
new_data_dict = {}
files = ["in.csv","in2.csv","in3.csv"]
headers = set()
for f in files:
with open(f) as f:
headers.update(f.next().rstrip().split(",")[2:])
for row in f:
row = row.strip().split(",")
new_data_dict.setdefault(row[0],set())
new_data_dict[row[0]].update(row[1:])
headers = ["Name","Mass (kg)"] + sorted(headers,key=lambda x: int(x.split("_")[-1]))
with open("out.csv","w") as out:
writer = csv.writer(out)
writer.writerow(headers)
for k,v in new_data_dict.items():
writer.writerow([k]+list(v))
要保留订单:
for f in files:
with open(f) as f:
headers.update(f.next().rstrip().split(",")[2:])
for row in f:
row = row.strip().split(",")
new_data_dict.setdefault(row[0],[])
new_data_dict[row[0]]+= row[1:]
headers = ["Name","Mass (kg)"] + sorted(headers,key=lambda x: int(x.split("_")[-1]))
with open("out.csv","w") as out:
writer = csv.writer(out)
writer.writerow(headers)
for k,v in new_data_dict.items():
writer.writerow([k]+sorted(set(v),key=lambda x: new_data_dict[k].index(x)))
Name,Mass (kg), run_1, run_2, run_3
Four, 4, 2.0
Three, 3, 6.5, 5.9
Two, 2, 4.5, 6.7
One, 1, 5.4, 5.7, 4.7