本文介绍了在有条件的情况下对两个数据框进行复杂的Map操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pandas df中有一张桌子.

i have a table in pandas df.

id    prod1    prod2    count
1      10       30        100
2      10       20        200
3      20       10        200
4      30       10        100
5      30       40        300

我在df2中还有另一个表

also i am having another table in df2

product  price   master_product
  1000       1          10
  5000       2          10
  2000       2          20
  9000       5          20
  8000       1          20
  30         3          0
  4000       4          50

检查prod1和prod2是否属于master_product中的值,

Check if prod1 and prod2 belongs to the values in master_product,

如果是我想用master_product中最便宜的产品替换我的第一个df中的prod1和prod2.

if yesi want to replace prod1 and prod2 in my first df with the cheapest product in my master_product.

如果prod1和prod2值与master_product中的值不匹配,保持原样.

if the prod1 and prod2 values donot match with the values in master_product,leave the values as it is.

我正在寻找决赛桌.

id    prod1    prod2    count
1      1000    4000        100
2      1000    8000        200
3      8000    1000        200
4      30       1000        100   #since 30 is not in master_product,leave as it is
5      30       40        300

我正在尝试使用.map函数来实现这一目标但我只能做到这一点.

i was trying to use .map function to achieve this but i could only reach to this.

df['prod1'] = df['prod1'].map(df2.set_index('master_product')['product'])
df['prod2'] = df['prod2'].map(df2.set_index('master_product')['product'])

但是它将尝试用df2的master_product中的匹配值替换prod1和prod2中的每个值.

but it will try to replace every values in prod1 and prod2 with matching values in master_product from df2.

有什么想法可以实现这一目标吗?

Any ideas how to achieve this?

推荐答案

通过 groupby idxmin -以最小的price获取所有索引:

You can first modify df1 for minimal price by master_product by groupby with idxmin - get all indices with minimal price:

df1 = df1.loc[df1.groupby('master_product')['price'].idxmin()]
print (df1)
   product  price  master_product
5       30      3               0
0     1000      1              10
4     8000      1              20
6     4000      4              50

创建dict进行映射:

d = df1.set_index('master_product')['product'].to_dict()
print (d)
{0: 30, 10: 1000, 20: 8000, 50: 4000}

最后一个 map 和如果缺少值,请通过 combine_first :

Last map and if value is missing add it by combine_first:

df.prod1 = df.prod1.map(d).combine_first(df.prod1)
df.prod2 = df.prod2.map(d).combine_first(df.prod2)
print (df)
   id   prod1   prod2  count
0   1  1000.0    30.0    100
1   2  1000.0  8000.0    200
2   3  8000.0  1000.0    200
3   4    30.0  1000.0    100
4   5    30.0    40.0    300

这篇关于在有条件的情况下对两个数据框进行复杂的Map操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!