在循环中合并数据帧

在循环中合并数据帧

本文介绍了在循环中合并数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含许多 csv 文件的文件夹,如下所示:

I have a folder with numerous csv files which look like this:

csv1

        2006    Percent       Land_Use
    0     13   5.379564      Developed
    1      8  25.781580  Grass/Pasture
    2      4  54.265050           Crop
    3     15   0.363983          Water
    4     16   6.244104       Wetlands
    5      6   4.691764         Forest
    6      1   3.031494        Alfalfa
    7     11   0.137424      Shrubland
    8      5   0.003671          Vetch
    9      3   0.055412         Barren
    10     7   0.009531          Grass
    11    12   0.036423           Tree

csv2

   2007    Percent       Land_Use
0     13   2.742430      Developed
1      4  56.007242           Crop
2      8  24.227963  Grass/Pasture
3     16   8.839979       Wetlands
4      6   6.181062         Forest
5      1   1.446668        Alfalfa
6     15   0.366116          Water
7      3   0.127760         Barren
8     11   0.034426      Shrubland
9      7   0.000827          Grass
10    12   0.025528           Tree

csv3

    2008    Percent       Land_Use
0    13   1.863809      Developed
1     8  31.455578  Grass/Pasture
2     4  57.896856           Crop
3    16   2.693929       Wetlands
4     6   4.417966         Forest
5     1   1.239176        Alfalfa
6     7   0.130849          Grass
7    15   0.266536          Water
8    11   0.004571      Shrubland
9     3   0.030731         Barren

并且我想将它们全部合并到 Land_Use

and I want to merge them all together into one DataFrame on Land_Use

我正在阅读这样的文件:

I am reading in the files like this:

pth = (r'G:\')
for f in os.listdir(pth):
df=pd.read_csv(os.path.join(pth,f)

但我不知道在那之后如何合并所有单独的数据帧.我想出了如何连接它们,但这不是我想要的.我想要的合并类型是 outer.

but I can't figure out how to merge all the individual dataframes after that. I figured out how to concat them but that isn't what I want. The type of merge I want is outer.

如果我要对每个 csv 文件使用路径,我会像这样合并它们,但我不想为每个文件设置路径,因为它们很多:

If I were to use a pathway to each csv file I would merge them like this, but I do NOT want to set a pathway to each file as there are many of them:

    one=pd.read_csv(r'G:\one.csv')
    two=pd.read_csv(r'G:\two.csv')
    three=pd.read_csv(r'G:\three.csv')
    merge=pd.merge(one,two, on=['Land_Use'], how='outer')
    mergetwo=pd.merge(merge,three,on=['Land_Use'], how='outer')

推荐答案

我认为你可以在 python 3 中使用:

I think you can use in python 3:

import functools

dfs = [df1,df2,df3]

df = functools.reduce(lambda left,right: pd.merge(left,right,on='Land_Use',how='outer'),dfs)
print (df)
    2006  Percent_x       Land_Use  2007  Percent_y  2008    Percent
0     13   5.379564      Developed  13.0   2.742430  13.0   1.863809
1      8  25.781580  Grass/Pasture   8.0  24.227963   8.0  31.455578
2      4  54.265050           Crop   4.0  56.007242   4.0  57.896856
3     15   0.363983          Water  15.0   0.366116  15.0   0.266536
4     16   6.244104       Wetlands  16.0   8.839979  16.0   2.693929
5      6   4.691764         Forest   6.0   6.181062   6.0   4.417966
6      1   3.031494        Alfalfa   1.0   1.446668   1.0   1.239176
7     11   0.137424      Shrubland  11.0   0.034426  11.0   0.004571
8      5   0.003671          Vetch   NaN        NaN   NaN        NaN
9      3   0.055412         Barren   3.0   0.127760   3.0   0.030731
10     7   0.009531          Grass   7.0   0.000827   7.0   0.130849
11    12   0.036423           Tree  12.0   0.025528   NaN        NaN

python 2 中:

df = reduce(lambda left,right: pd.merge(left,right,on='Land_Use',how='outer'),dfs)

使用 glob 的工作解决方案:

Working solution with glob:

import pandas as pd
import functools
import glob

pth = 'a/*.csv'
files = glob.glob(pth)
dfs = [pd.read_csv(f, sep=';') for f in files]

df = functools.reduce(lambda left,right: pd.merge(left,right,on='Land_Use', how='outer'),dfs)
print (df)
    2006  Percent_x       Land_Use  2008  Percent_y  2007    Percent
0     13   5.379564      Developed  13.0   1.863809  13.0   2.742430
1      8  25.781580  Grass/Pasture   8.0  31.455578   8.0  24.227963
2      4  54.265050           Crop   4.0  57.896856   4.0  56.007242
3     15   0.363983          Water  15.0   0.266536  15.0   0.366116
4     16   6.244104       Wetlands  16.0   2.693929  16.0   8.839979
5      6   4.691764         Forest   6.0   4.417966   6.0   6.181062
6      1   3.031494        Alfalfa   1.0   1.239176   1.0   1.446668
7     11   0.137424      Shrubland  11.0   0.004571  11.0   0.034426
8      5   0.003671          Vetch   NaN        NaN   NaN        NaN
9      3   0.055412         Barren   3.0   0.030731   3.0   0.127760
10     7   0.009531          Grass   7.0   0.130849   7.0   0.000827
11    12   0.036423           Tree   NaN        NaN  12.0   0.025528

这篇关于在循环中合并数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 17:41