问题描述
我有2个大小不同的数据帧,并以有效的方式合并了相关数据:
I have 2 dataframes with different size with related data to be merged in an efficient way:
master_df = pd.DataFrame({'kpi_1': [1,2,3,4]},
index=['dn1_app1_bar.com',
'dn1_app2_bar.com',
'dn2_app1_foo.com',
'dn2_app2_foo.com'])
guard_df = pd.DataFrame({'kpi_2': [1,2],
'kpi_3': [10,20]},
index=['dn1_bar.com', 'dn2_foo.com'])
master_df:
master_df:
kpi_1
dn1_app1_bar.com 1
dn1_app2_bar.com 2
dn2_app1_foo.com 3
dn2_app2_foo.com 4
guard_df:
kpi_2 kpi_3
dn1_bar.com 1 10
dn2_foo.com 2 20
我想从guard_df
的行中获取值,并以<group>_<name>
的索引传播"到所有master_df
匹配的行中的值<group>_.*_<name>
.
I want to get a dataframe with values from a guard_df
's row indexed with <group>_<name>
"propagated' to all master_df
's rows matching<group>_.*_<name>
.
预期结果:
kpi_1 kpi_2 kpi_3
dn1_app1_bar.com 1 1.0 10.0
dn1_app2_bar.com 2 1.0 10.0
dn2_app1_foo.com 3 2.0 20.0
dn2_app2_foo.com 4 2.0 20.0
到目前为止,我管理的是以下基本方法:
What I've managed so far is the following basic approach:
def eval_base_dn(dn):
chunks = dn.split('_')
return '_'.join((chunks[0], chunks[2]))
for dn in master_df.index:
for col in guard_df.columns:
master_df.loc[dn, col] = guard_df.loc[eval_base_dn(dn), col]
但是我正在寻找一种更高性能的方式来广播"值并合并数据帧.
but I'm looking for some more performant way to "broadcast" the values and merge the dataframes.
推荐答案
如果可以使用0.25+熊猫,请通过array
,此处将index
传递给merge
的on
参数,并使用左连接:
If use pandas 0.25+ is possible pass array
, here index
to on
parameter of merge
with left join:
master_df = master_df.merge(guard_df,
left_on=master_df.index.str.replace('_.+_', '_'),
right_index=True,
how='left')
print (master_df)
kpi_1 kpi_2 kpi_3
dn1_app1_bar.com 1 1 10
dn1_app2_bar.com 2 1 10
dn2_app1_foo.com 3 2 20
dn2_app2_foo.com 4 2 20
这篇关于合并具有不同维度和相关数据的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!