合并2个CSV文件

本文介绍了合并2个CSV文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

OK我在堆栈溢出中读取了几个线程。我认为这将是相当容易我做，但我发现，我仍然没有很好的掌握python。我尝试了位于，这是有帮助的，但我仍然没有我希望实现的结果。

基本上我有2个csv文件，共同的第一列。我想合并2. ie

filea.csv

 
 ，stage，jan，feb 
 darn，3.001,0.421,0.532 
 ok，2.829,1.036,0.751 
 three，1.115,1.146,2.921

fileb.csv

 
 title，mar，apr，may，jun，
 darn，0.631,1.321,0.951,1.751 
 ok，1.001,0.247,2.456,0.3216 
 three，0.285,1.283,0.924,956

output.csv（不是我得到的，而是我想要的）

 
 title，stage， feb，mar，apr，may，jun 
 darn，3.001,0.421,0.532,0.631,1.321,0.951,1.751 
 ok，2.829,1.036,0.751,1.001,0.247,2.456,0.3216 
 three，1.115,1.146,2.921,0.285,1.283,0.924,956

output.csv（我实际获得的输出） / p>

 
 title，feb，may 
 ok，0.751,2.456 
 three，2.921,0.924 
 darn， 0.532,0.951

我尝试的代码：

 code>'''
测试合并2个csv文件
'''
 import csv 
 import array 
 import os 
 
与open（'Z：\\Desktop\\\\\\\\\\\\\\\\\\\\\\\\\\\ dict1 = {row [0]：row [3] for row in r} 
 
 with open（'Z：\\Desktop\\test\\fileb.csv'） as f：
r = csv.reader（f，delimiter ='，'）
＃dict2 = {row [0]：row [3] for row in r} 
 dict2 = {row [0：3] for row in r} 
 
 print str（dict1）
 print str（dict2）
 
 keys = set（dict1.keys dict2.keys（））
 with open（'Z：\\Desktop\\test\\output.csv'，'wb'）as f：
w = csv.writer （f，delimiter ='，'）
 w.writerows（[key，dict1.get（key，''），dict2.get（key，''）] ）

任何帮助都非常感激。

解决方案

当我使用 csv 文件时，我经常使用库。它使这样的东西很容易。例如：

  import pandas as pd 
 
a = pd.read_csv（filea.csv） 
b = pd.read_csv（fileb.csv）
b = b.dropna（axis = 1）
 merged = a.merge（b，on ='title'）
 merged.to_csv（output.csv，index = False）

$ b b

以下是一些解释。首先，我们读取csv文件：

 >> a = pd.read_csv（filea.csv）
>>> b = pd.read_csv（fileb.csv）
>>>> a 
标题阶段jan feb 
 0 darn 3.001 0.421 0.532 
 1 ok 2.829 1.036 0.751 
 2三1.115 1.146 2.921 
>>> b 
 title mar apr may jun未命名：5 
 0 darn 0.631 1.321 0.951 1.7510 NaN 
 1 ok 1.001 0.247 2.456 0.3216 NaN 
 2三0.285 1.283 0.924 956.0000 NaN 
  
 
 ，我们看到有一个额外的数据列（注意 fileb.csv的第一行  -   title，mar，apr，may，jun，  - 在结尾处有一个额外的逗号。我们可以轻松地摆脱这些：
 >>> b = b.dropna（axis = 1）
>>>> b 
 title mar apr may jun 
 0 darn 0.631 1.321 0.951 1.7510 
 1 ok 1.001 0.247 2.456 0.3216 
 2三0.285 1.283 0.924 956.0000 
  
现在我们可以合并 a 和 b 标题栏：
 >>> merged = a.merge（b，on ='title'）
>>>>合并
标题阶段jan feb mar apr may jun 
 0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510 
 1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216 
 2三1.115 1.146 2.921 0.285 1.283 0.924 956.0000 
  
最后写出：
 >>> merged.to_csv（output.csv，index = False）
  
产生：
  title，stage，jan，feb，mar，apr，may，jun 
 darn，3.001,0.421,0.532,0.631,1.321， 0.951,1.751 
 ok，2.829,1.036,0.751,1.001,0.247,2.456,0.3216 
 three，1.115,1.146,2.921,0.285,1.283,0.924,956.0 
  
 
OK I have read several threads here on stack overflow.  I thought this would be fairly easy for me to do but I find that I still do not have a very good grasp of python.  I tried the example located at How to combine 2 csv files with common column value, but both files have different number of lines and that was helpful but I still do not have the results that I was hoping to achieve. 
Essentially I have 2 csv files with a common first column. I would like to merge the 2. i.e.
filea.csv
title,stage,jan,feb
darn,3.001,0.421,0.532
ok,2.829,1.036,0.751
three,1.115,1.146,2.921
fileb.csv
title,mar,apr,may,jun,
darn,0.631,1.321,0.951,1.751
ok,1.001,0.247,2.456,0.3216
three,0.285,1.283,0.924,956
output.csv (not the one I am getting but what I want)
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956
output.csv (the output that I actually got)
title,feb,may
ok,0.751,2.456
three,2.921,0.924
darn,0.532,0.951
The code I was trying:
'''
testing merging of 2 csv files
'''
import csv
import array
import os

with open('Z:\\Desktop\\test\\filea.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict1 = {row[0]: row[3] for row in r}

with open('Z:\\Desktop\\test\\fileb.csv') as f:
    r = csv.reader(f, delimiter=',')
    #dict2 = {row[0]: row[3] for row in r}
    dict2 = {row[0:3] for row in r}

print str(dict1)
print str(dict2)

keys = set(dict1.keys() + dict2.keys())
with open('Z:\\Desktop\\test\\output.csv', 'wb') as f:
    w = csv.writer(f, delimiter=',')
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])
Any help is greatly appreciated.
 解决方案 
When I'm working with csv files, I often use the pandas library.  It makes things like this very easy.  For example:
import pandas as pd

a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)
Some explanation follows.  First, we read in the csv files:
>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
   title  stage    jan    feb
0   darn  3.001  0.421  0.532
1     ok  2.829  1.036  0.751
2  three  1.115  1.146  2.921
>>> b
   title    mar    apr    may       jun  Unnamed: 5
0   darn  0.631  1.321  0.951    1.7510         NaN
1     ok  1.001  0.247  2.456    0.3216         NaN
2  three  0.285  1.283  0.924  956.0000         NaN
and we see there's an extra column of data (note that the first line of fileb.csv -- title,mar,apr,may,jun, -- has an extra comma at the end).  We can get rid of that easily enough:
>>> b = b.dropna(axis=1)
>>> b
   title    mar    apr    may       jun
0   darn  0.631  1.321  0.951    1.7510
1     ok  1.001  0.247  2.456    0.3216
2  three  0.285  1.283  0.924  956.0000
Now we can merge a and b on the title column:
>>> merged = a.merge(b, on='title')
>>> merged
   title  stage    jan    feb    mar    apr    may       jun
0   darn  3.001  0.421  0.532  0.631  1.321  0.951    1.7510
1     ok  2.829  1.036  0.751  1.001  0.247  2.456    0.3216
2  three  1.115  1.146  2.921  0.285  1.283  0.924  956.0000
and finally write this out:
>>> merged.to_csv("output.csv", index=False)
producing:
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0
                        
这篇关于合并2个CSV文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！