问题描述
我已经检查了几个早期的问题,我有一些独特的问题.我有三个 excel 文件,并将它们加载到三个不同的数据框中.基本上我必须添加 excel_1 和 excel_2 的内容并将内容与 excel_3 进行比较
I have already checked few earlier questions and I have some what unique problem.I have three excel file and I load them into three different dataframe.Basically I have to add contents of excel_1 and excel_2 and compare the contents against excel_3
示例数据:(excel_1 sales Territory#1)
Example data: (excel_1 sales Territory#1)
Name Year Item sales_Amount1
A1 1.2019 Badam 2
A1 1.2019 Badam 10
A1 1.2019 carrot 8
A1 1.2019 carrot 10
A2 1.2019 Badam 10
A2 1.2019 Badam 20
A3 2.2019 soap 3
A1 2.2019 soap 1
示例数据:(excel_2 sales Territory#2)
Example data: (excel_2 sales Territory#2)
Name Year Item sales_Amount2
A1 1.2019 Badam 60
A1 1.2019 Badam 10
A2 1.2019 Badam 40
A2 1.2019 Badam 1
A3 2.2019 soap 1
A3 2.2019 soap 10
A1 2.2019 soap 10
excel_3 target 也有类似的数据
excel_3 target also has similar data
Name Year Item target_Amount
A1 1.2019 Badam 100
A2 1.2019 Badam 30
A1 1.2019 carrot 200
A3 2.2019 soap 3
基本上,我必须将销售额 1 和 2 相加,并将结果与目标数据进行比较.我想得到一个带有列详细信息的单个 csv,如下所示.因此,我可以像我提到的那样进行计算.
Basically I have to add sales amount 1 and 2 and compare the results against target data. I would like to arrive a single csv with column details as below. Hence I can do the calculations as I mentioned.
Name Year Item sales_Amount1 Sales_Amount2 target_Amount
A1 1.2019 Badam 12 70 100
A1 1.2019 carrot 18 0 200
A2 1.2019 Badam 30 41 30
A1 2.2019 soap 1 10 0
A3 2.1019 soap 3 11 3
df1 = pd.read_excel(r"excel_1.xlxs")
sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})
df2 = pd.read_excel(r"excel_2.xlxs")
sum_sales1 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})
df3 = pd.read_excel(r"excel_3.xlxs")
sum_sales1 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})
基本上我将每个 csv 加载到一个数据帧中,然后 groupby 以获取每个项目的聚合总和,如上所示.现在将如上所示的所有三个数据帧与以下列进行比较和合并有点棘手
Basically I loaded each csv into one dataframe and then groupby to get the aggregated sum of each items as shown above.It's bit tricky now to compare and merge all three dataframe as shown above with the following columns
Name Year Item sales_Amount1 Sales_Amount2 target_Amount
这种方法是否适合比较三个数据帧并将它们合并为一个,或者我应该转向数据透视表.在我继续之前选择哪一个有点令人困惑.谢谢.
Is this approach right towards comparing three dataframes and merge them as one or should I move towards pivot table. It's bit confusing which one to chose before I proceed. Thanks.
推荐答案
使用 concat
与 DataFrame.fillna
:
sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})
sum_sales2 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})
sum_sales3 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})
df = (pd.concat([sum_sales1, sum_sales2, sum_sales3],
axis=1).fillna(0).astype(int).reset_index())
print (df)
Name Year Item sales_Amount1 sales_Amount2 target_Amount
0 A1 1.2019 Badam 12 70 100
1 A1 1.2019 carrot 18 0 200
2 A1 2.2019 soap 1 10 0
3 A2 1.2019 Badam 30 41 30
4 A3 2.2019 soap 3 11 3
如果最后一列是必要的聚合 - 然后使用列表理解:
If last column is necessary aggregate - then use list comprehension:
dfs = [df1, df2, df3]
dfs = [x.groupby(['Name','Year', 'Item']).agg({x.columns[-1]: 'sum'}) for x in dfs]
df = pd.concat(dfs, axis=1).fillna(0).astype(int).reset_index()
print (df)
Name Year Item sales_Amount1 sales_Amount2 target_Amount
0 A1 1.2019 Badam 12 70 100
1 A1 1.2019 carrot 18 0 200
2 A1 2.2019 soap 1 10 0
3 A2 1.2019 Badam 30 41 30
4 A3 2.2019 soap 3 11 3
这篇关于比较两个或三个数据框中的列值并合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!