本文介绍了如何基于2列的比较来合并2 df以匹配1列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何合并2个df,1列以匹配2列??
How to .merge 2 df, 1 column to match 2 columns ??
- 目标是合并2个df,以将每个活动ID从REF表到ID的数据的记录计数.
- 问题.merge仅将1列与1列进行比较
数据混乱了,对于某些行,有id名称而不是id.
The Data is mess up and for some rows there are id names rather then id's.
如果我想将1列合并为1列,或将2列合并为2列,而不是将1列合并为2列,则可以使用
It works if I want to merge 1 column to 1 column, or 2 columns to 2 columns, but NOT for 1 column to 2 columns
Reff表
g_spend =
campaignid id_name cost
154 campaign1 15
155 campaign2 12
1566 campaign33 12
158 campaign4 33
数据
cw =
campaignid
154
154
155
campaign1
campaign33
1566
158
campaign1
campaign1
campaign33
campaign4
所需的输出
g_spend =
campaignid id_name cost leads
154 campaign1 15 5
155 campaign2 12 0
1566 campaign33 12 3
158 campaign4 33 2
我做了什么.
# Just work for one column
cw.head()
grouped_cw = cw.groupby(["campaignid"]).count()
grouped_cw.rename(columns={'reach':'leads'}, inplace=True)
grouped_cw = pd.DataFrame(grouped_cw)
# now merging
g_spend.campaignid = g_spend.campaignid.astype(str)
g_spend = g_spend.merge(grouped_cw, left_on='campaignid', right_index=True)
推荐答案
我首先将id_name
设置为g_spend
中的索引,然后在cw
上执行replace
,然后执行value_counts
:
I would first set id_name
as index in g_spend
, then do a replace
on cw
, followed by a value_counts
:
s = (cw.campaignid
.replace(g_spend.set_index('id_name').campaignid
.value_counts()
.to_frame('leads')
)
g_spend = g_spend.merge(s, left_on='campaignid', right_index=True)
输出:
campaignid id_name cost leads
0 154 campaign1 15 5
1 155 campaign2 12 1
2 1566 campaign33 12 3
3 158 campaign4 33 2
这篇关于如何基于2列的比较来合并2 df以匹配1列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!