问题描述
我有一个包含许多 csv 文件的文件夹,如下所示:
I have a folder with numerous csv files which look like this:
csv1
2006 Percent Land_Use
0 13 5.379564 Developed
1 8 25.781580 Grass/Pasture
2 4 54.265050 Crop
3 15 0.363983 Water
4 16 6.244104 Wetlands
5 6 4.691764 Forest
6 1 3.031494 Alfalfa
7 11 0.137424 Shrubland
8 5 0.003671 Vetch
9 3 0.055412 Barren
10 7 0.009531 Grass
11 12 0.036423 Tree
csv2
2007 Percent Land_Use
0 13 2.742430 Developed
1 4 56.007242 Crop
2 8 24.227963 Grass/Pasture
3 16 8.839979 Wetlands
4 6 6.181062 Forest
5 1 1.446668 Alfalfa
6 15 0.366116 Water
7 3 0.127760 Barren
8 11 0.034426 Shrubland
9 7 0.000827 Grass
10 12 0.025528 Tree
csv3
2008 Percent Land_Use
0 13 1.863809 Developed
1 8 31.455578 Grass/Pasture
2 4 57.896856 Crop
3 16 2.693929 Wetlands
4 6 4.417966 Forest
5 1 1.239176 Alfalfa
6 7 0.130849 Grass
7 15 0.266536 Water
8 11 0.004571 Shrubland
9 3 0.030731 Barren
并且我想将它们全部合并到 Land_Use
and I want to merge them all together into one DataFrame on Land_Use
我正在阅读这样的文件:
I am reading in the files like this:
pth = (r'G:\')
for f in os.listdir(pth):
df=pd.read_csv(os.path.join(pth,f)
但我不知道在那之后如何合并所有单独的数据帧.我想出了如何连接它们,但这不是我想要的.我想要的合并类型是 outer
.
but I can't figure out how to merge all the individual dataframes after that. I figured out how to concat them but that isn't what I want. The type of merge I want is outer
.
如果我要对每个 csv 文件使用路径,我会像这样合并它们,但我不想为每个文件设置路径,因为它们很多:
If I were to use a pathway to each csv file I would merge them like this, but I do NOT want to set a pathway to each file as there are many of them:
one=pd.read_csv(r'G:\one.csv')
two=pd.read_csv(r'G:\two.csv')
three=pd.read_csv(r'G:\three.csv')
merge=pd.merge(one,two, on=['Land_Use'], how='outer')
mergetwo=pd.merge(merge,three,on=['Land_Use'], how='outer')
推荐答案
我认为你可以在 python 3
中使用:
I think you can use in python 3
:
import functools
dfs = [df1,df2,df3]
df = functools.reduce(lambda left,right: pd.merge(left,right,on='Land_Use',how='outer'),dfs)
print (df)
2006 Percent_x Land_Use 2007 Percent_y 2008 Percent
0 13 5.379564 Developed 13.0 2.742430 13.0 1.863809
1 8 25.781580 Grass/Pasture 8.0 24.227963 8.0 31.455578
2 4 54.265050 Crop 4.0 56.007242 4.0 57.896856
3 15 0.363983 Water 15.0 0.366116 15.0 0.266536
4 16 6.244104 Wetlands 16.0 8.839979 16.0 2.693929
5 6 4.691764 Forest 6.0 6.181062 6.0 4.417966
6 1 3.031494 Alfalfa 1.0 1.446668 1.0 1.239176
7 11 0.137424 Shrubland 11.0 0.034426 11.0 0.004571
8 5 0.003671 Vetch NaN NaN NaN NaN
9 3 0.055412 Barren 3.0 0.127760 3.0 0.030731
10 7 0.009531 Grass 7.0 0.000827 7.0 0.130849
11 12 0.036423 Tree 12.0 0.025528 NaN NaN
在 python 2
中:
df = reduce(lambda left,right: pd.merge(left,right,on='Land_Use',how='outer'),dfs)
使用 glob
的工作解决方案:
Working solution with glob
:
import pandas as pd
import functools
import glob
pth = 'a/*.csv'
files = glob.glob(pth)
dfs = [pd.read_csv(f, sep=';') for f in files]
df = functools.reduce(lambda left,right: pd.merge(left,right,on='Land_Use', how='outer'),dfs)
print (df)
2006 Percent_x Land_Use 2008 Percent_y 2007 Percent
0 13 5.379564 Developed 13.0 1.863809 13.0 2.742430
1 8 25.781580 Grass/Pasture 8.0 31.455578 8.0 24.227963
2 4 54.265050 Crop 4.0 57.896856 4.0 56.007242
3 15 0.363983 Water 15.0 0.266536 15.0 0.366116
4 16 6.244104 Wetlands 16.0 2.693929 16.0 8.839979
5 6 4.691764 Forest 6.0 4.417966 6.0 6.181062
6 1 3.031494 Alfalfa 1.0 1.239176 1.0 1.446668
7 11 0.137424 Shrubland 11.0 0.004571 11.0 0.034426
8 5 0.003671 Vetch NaN NaN NaN NaN
9 3 0.055412 Barren 3.0 0.030731 3.0 0.127760
10 7 0.009531 Grass 7.0 0.130849 7.0 0.000827
11 12 0.036423 Tree NaN NaN 12.0 0.025528
这篇关于在循环中合并数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!