本文介绍了如何基于2列的比较来合并2 df以匹配1列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何合并2个df,1列以匹配2列??

How to .merge 2 df, 1 column to match 2 columns ??

  • 目标是合并2个df,以将每个活动ID从REF表到ID的数据的记录计数.
  • 问题.merge仅将1列与1列进行比较

数据混乱了,对于某些行,有id名称而不是id.

The Data is mess up and for some rows there are id names rather then id's.

如果我想将1列合并为1列,或将2列合并为2列,而不是将1列合并为2列,则可以使用

It works if I want to merge 1 column to 1 column, or 2 columns to 2 columns, but NOT for 1 column to 2 columns

Reff表

g_spend =

campaignid   id_name      cost

154          campaign1    15
155          campaign2    12
1566         campaign33   12
158          campaign4    33

数据

cw = 

campaignid

154
154
155
campaign1    
campaign33
1566
158
campaign1    
campaign1    
campaign33
campaign4

所需的输出



g_spend =

campaignid  id_name      cost    leads

154        campaign1    15       5
155        campaign2    12       0
1566       campaign33   12       3
158        campaign4    33       2

我做了什么.

# Just work for one column

cw.head()
grouped_cw = cw.groupby(["campaignid"]).count()
grouped_cw.rename(columns={'reach':'leads'}, inplace=True)

grouped_cw = pd.DataFrame(grouped_cw)


# now merging
g_spend.campaignid = g_spend.campaignid.astype(str)

g_spend = g_spend.merge(grouped_cw, left_on='campaignid', right_index=True)

推荐答案

我首先将id_name设置为g_spend中的索引,然后在cw上执行replace,然后执行value_counts:

I would first set id_name as index in g_spend, then do a replace on cw, followed by a value_counts:

s = (cw.campaignid
       .replace(g_spend.set_index('id_name').campaignid
       .value_counts()
       .to_frame('leads')
    )

g_spend = g_spend.merge(s, left_on='campaignid', right_index=True)

输出:

  campaignid     id_name  cost  leads
0        154   campaign1    15      5
1        155   campaign2    12      1
2       1566  campaign33    12      3
3        158   campaign4    33      2

这篇关于如何基于2列的比较来合并2 df以匹配1列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 03:14