问题描述
OK我在堆栈溢出中读取了几个线程。我认为这将是相当容易我做,但我发现,我仍然没有很好的掌握python。我尝试了位于,这是有帮助的,但我仍然没有我希望实现的结果。
基本上我有2个csv文件,共同的第一列。我想合并2. ie
filea.csv
,stage,jan,feb
darn,3.001,0.421,0.532
ok,2.829,1.036,0.751
three,1.115,1.146,2.921
fileb.csv
title,mar,apr,may,jun,
darn,0.631,1.321,0.951,1.751
ok,1.001,0.247,2.456,0.3216
three,0.285,1.283,0.924,956
output.csv(不是我得到的,而是我想要的)
title,stage, feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956
output.csv(我实际获得的输出) / p>
title,feb,may
ok,0.751,2.456
three,2.921,0.924
darn, 0.532,0.951
我尝试的代码:
code>'''
测试合并2个csv文件
'''
import csv
import array
import os
与open('Z:\\Desktop\\\\\\\\\\\\\\\\\\\\\\\\\\\ dict1 = {row [0]:row [3] for row in r}
with open('Z:\\Desktop\\test\\fileb.csv') as f:
r = csv.reader(f,delimiter =',')
#dict2 = {row [0]:row [3] for row in r}
dict2 = {row [0:3] for row in r}
print str(dict1)
print str(dict2)
keys = set(dict1.keys dict2.keys())
with open('Z:\\Desktop\\test\\output.csv','wb')as f:
w = csv.writer (f,delimiter =',')
w.writerows([key,dict1.get(key,''),dict2.get(key,'')] )
任何帮助都非常感激。
当我使用 csv 文件时,我经常使用库。它使这样的东西很容易。例如:
import pandas as pd
a = pd.read_csv(filea.csv)
b = pd.read_csv(fileb.csv)
b = b.dropna(axis = 1)
merged = a.merge(b,on ='title')
merged.to_csv(output.csv,index = False)
$ b b
以下是一些解释。首先,我们读取csv文件:
>> a = pd.read_csv(filea.csv)
>>> b = pd.read_csv(fileb.csv)
>>>> a
标题阶段jan feb
0 darn 3.001 0.421 0.532
1 ok 2.829 1.036 0.751
2三1.115 1.146 2.921
>>> b
title mar apr may jun未命名:5
0 darn 0.631 1.321 0.951 1.7510 NaN
1 ok 1.001 0.247 2.456 0.3216 NaN
2三0.285 1.283 0.924 956.0000 NaN
,我们看到有一个额外的数据列(注意 fileb.csv的第一行 - title,mar,apr,may,jun, - 在结尾处有一个额外的逗号。我们可以轻松地摆脱这些:
>>> b = b.dropna(axis = 1)
>>>> b
title mar apr may jun
0 darn 0.631 1.321 0.951 1.7510
1 ok 1.001 0.247 2.456 0.3216
2三0.285 1.283 0.924 956.0000
现在我们可以合并 a 和 b 标题栏:
>>> merged = a.merge(b,on ='title')
>>>>合并
标题阶段jan feb mar apr may jun
0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510
1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216
2三1.115 1.146 2.921 0.285 1.283 0.924 956.0000
最后写出:
>>> merged.to_csv(output.csv,index = False)
产生:
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321, 0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0
OK I have read several threads here on stack overflow. I thought this would be fairly easy for me to do but I find that I still do not have a very good grasp of python. I tried the example located at How to combine 2 csv files with common column value, but both files have different number of lines and that was helpful but I still do not have the results that I was hoping to achieve.
Essentially I have 2 csv files with a common first column. I would like to merge the 2. i.e.
filea.csv
title,stage,jan,feb darn,3.001,0.421,0.532 ok,2.829,1.036,0.751 three,1.115,1.146,2.921fileb.csv
title,mar,apr,may,jun, darn,0.631,1.321,0.951,1.751 ok,1.001,0.247,2.456,0.3216 three,0.285,1.283,0.924,956output.csv (not the one I am getting but what I want)
title,stage,jan,feb,mar,apr,may,jun darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751 ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216 three,1.115,1.146,2.921,0.285,1.283,0.924,956output.csv (the output that I actually got)
title,feb,may ok,0.751,2.456 three,2.921,0.924 darn,0.532,0.951The code I was trying:
''' testing merging of 2 csv files ''' import csv import array import os with open('Z:\\Desktop\\test\\filea.csv') as f: r = csv.reader(f, delimiter=',') dict1 = {row[0]: row[3] for row in r} with open('Z:\\Desktop\\test\\fileb.csv') as f: r = csv.reader(f, delimiter=',') #dict2 = {row[0]: row[3] for row in r} dict2 = {row[0:3] for row in r} print str(dict1) print str(dict2) keys = set(dict1.keys() + dict2.keys()) with open('Z:\\Desktop\\test\\output.csv', 'wb') as f: w = csv.writer(f, delimiter=',') w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])Any help is greatly appreciated.
解决方案When I'm working with csv files, I often use the pandas library. It makes things like this very easy. For example:
import pandas as pd a = pd.read_csv("filea.csv") b = pd.read_csv("fileb.csv") b = b.dropna(axis=1) merged = a.merge(b, on='title') merged.to_csv("output.csv", index=False)Some explanation follows. First, we read in the csv files:
>>> a = pd.read_csv("filea.csv") >>> b = pd.read_csv("fileb.csv") >>> a title stage jan feb 0 darn 3.001 0.421 0.532 1 ok 2.829 1.036 0.751 2 three 1.115 1.146 2.921 >>> b title mar apr may jun Unnamed: 5 0 darn 0.631 1.321 0.951 1.7510 NaN 1 ok 1.001 0.247 2.456 0.3216 NaN 2 three 0.285 1.283 0.924 956.0000 NaNand we see there's an extra column of data (note that the first line of fileb.csv -- title,mar,apr,may,jun, -- has an extra comma at the end). We can get rid of that easily enough:
>>> b = b.dropna(axis=1) >>> b title mar apr may jun 0 darn 0.631 1.321 0.951 1.7510 1 ok 1.001 0.247 2.456 0.3216 2 three 0.285 1.283 0.924 956.0000Now we can merge a and b on the title column:
>>> merged = a.merge(b, on='title') >>> merged title stage jan feb mar apr may jun 0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510 1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216 2 three 1.115 1.146 2.921 0.285 1.283 0.924 956.0000and finally write this out:
>>> merged.to_csv("output.csv", index=False)producing:
title,stage,jan,feb,mar,apr,may,jun darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751 ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216 three,1.115,1.146,2.921,0.285,1.283,0.924,956.0
这篇关于合并2个CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!