本文介绍了Python pandas -特定的合并/替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
熊猫操作的新手,我有以下两个数据框:
new to pandas operations, I have these two dataframes:
import pandas as pd
df = pd.DataFrame({'name': ['a','a','b','b','c','c'], 'id':[1,2,1,2,1,2], 'val1':[0,0,0,0,0,0],'val2':[0,0,0,0,0,0],'val3':[0,0,0,0,0,0]})
id name val1 val2 val3
0 1 a 0 0 0
1 2 a 0 0 0
2 1 b 0 0 0
3 2 b 0 0 0
4 1 c 0 0 0
5 2 c 0 0 0
subdf = pd.DataFrame({'name': ['a','b','c'], 'id':[1,1,2],'val1':[0.3,0.4,0.7], 'val2':[4,5,4]}
id name val1 val2
0 1 a 0.3 4
1 1 b 0.4 5
2 2 c 0.7 4
我想获得输出:
id name val1 val2 val3
0 1 a 0.3 4 0
1 2 a 0.0 0 0
2 1 b 0.4 5 0
3 2 b 0.0 0 0
4 1 c 0.0 0 0
5 2 c 0.7 4 0
但是我没有发现替换示例,只是我看到的教程中增加了列/行!
But I did not catch example of replacement, just additions of columns/rows from the tutorials I saw !
推荐答案
这需要几个步骤,而 merge
在匹配的列上,这将在有冲突的地方创建"x"和"y":
This takes a couple steps, left merge
on the columns that match, this will create 'x' and 'y' where there are clashes:
In [25]:
merged = df.merge(subdf, on=['id', 'name'], how='left')
merged
Out[25]:
id name val1_x val2_x val3 val1_y val2_y
0 1 a 0 0 0 0.3 4
1 2 a 0 0 0 NaN NaN
2 1 b 0 0 0 0.4 5
3 2 b 0 0 0 NaN NaN
4 1 c 0 0 0 NaN NaN
5 2 c 0 0 0 0.7 4
In [26]:
# take the values that of interest from the clashes
merged['val1'] = np.max(merged[['val1_x', 'val1_y']], axis=1)
merged['val2'] = np.max(merged[['val2_x', 'val2_y']], axis=1)
merged
Out[26]:
id name val1_x val2_x val3 val1_y val2_y val1 val2
0 1 a 0 0 0 0.3 4 0.3 4
1 2 a 0 0 0 NaN NaN 0.0 0
2 1 b 0 0 0 0.4 5 0.4 5
3 2 b 0 0 0 NaN NaN 0.0 0
4 1 c 0 0 0 NaN NaN 0.0 0
5 2 c 0 0 0 0.7 4 0.7 4
In [27]:
# drop the additional columns
merged = merged.drop(labels=['val1_x', 'val1_y','val2_x', 'val2_y'], axis=1)
merged
Out[27]:
id name val3 val1 val2
0 1 a 0 0.3 4
1 2 a 0 0.0 0
2 1 b 0 0.4 5
3 2 b 0 0.0 0
4 1 c 0 0.0 0
5 2 c 0 0.7 4
另一种方法是对'id'和'name'上的df进行排序,然后调用 update
:
Another method would be to sort both df's on 'id' and 'name' and then call update
:
In [30]:
df = df.sort(columns=['id','name'])
subdf = subdf.sort(columns=['id','name'])
df.update(subdf)
df
Out[30]:
id name val1 val2 val3
0 1 a 0.3 4 0
2 2 c 0.7 4 0
4 1 c 0.0 0 0
1 1 b 0.4 5 0
3 2 b 0.0 0 0
5 2 c 0.0 0 0
这篇关于Python pandas -特定的合并/替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!