问题描述
- 我需要将两列进行比较:"EMAIL"和"LOCATION".
- 我正在使用电子邮件,因为它比此问题的名称更准确.
-
我的目标是找到每个人工作的总地点在,对位置总数进行汇总,以选择数据的工作表将被写入并将原始数据复制到新的工作表(标签).
-
我需要复制所有重复的原始数据位置,这就是让这个问题困扰我的地方.
下面是用于检查更多变化的输出更改列
df1
名称位置职位电子邮件0 Monty Jakarta经理11蒙蒂孟买经理12撒哈拉琼斯(Sahara Jonesh)巴黎厨师23 Tom App雅加达Buser 14 Tom App Paris Buser 2
df2所有列
名称位置...位置计数电子邮件计数0雅加达蒙蒂(Monty Jakarta)... 1 11孟买(Monty Mumbai)... 1 12撒哈拉琼斯(Sahara Jonesh Paris)... 2 23 Tom App雅加达... 1 14 Tom App Paris ... 2 2
- I need to compare two columns together: "EMAIL" and "LOCATION".
- I'm using Email because it's more accurate than name for this issue.
My objective is to find total number of locations each person workedat, sum up the total of locations to select which sheet the datawill been written to and copy the original data over to the newsheet(tab).
I need the original data copied over with all the duplicatelocations, which is where this problem stumps me.
Had to use images because it flagged post as spam
The Excel sheet (SAMPLE) I'm reading in as a data frame: Excel Sample Spreadsheet
Example:
Outcome:
- data copied to new sheets
Sheet 2
Sheet 3
Sheet 4
Thanks for taking your time looking at my problem =)
Hi Check below lines if work for you..
import pandas as pd
df = pd.read_excel('sample.xlsx')
df1 = df.groupby(['Name','Location','Job']).count().reset_index()
# this is long line
df2 = df.groupby(['Name','Location','Job','Email']).agg({'Location':'count','Email':'count'}).rename(columns={'Location':'Location Count','Email':'Email Count'}).reset_index()
print(df1)
print('\n\n')
print(df2)
below is the output change columns to check more variations
df1
Name Location Job Email
0 Monty Jakarta Manager 1
1 Monty Mumbai Manager 1
2 Sahara Jonesh Paris Cook 2
3 Tom App Jakarta Buser 1
4 Tom App Paris Buser 2
df2 all columns
Name Location ... Location Count Email Count
0 Monty Jakarta ... 1 1
1 Monty Mumbai ... 1 1
2 Sahara Jonesh Paris ... 2 2
3 Tom App Jakarta ... 1 1
4 Tom App Paris ... 2 2
这篇关于Pythom:比较2列并将数据写入Excel工作表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!