以下是我正在使用的更大/复杂数据框的小/玩具版本:
>>> A
key u v w x
0 a 0.757954 0.258917 0.404934 0.303313
1 b 0.583382 0.504687 NaN 0.618369
2 c NaN 0.982785 0.902166 NaN
3 d 0.898838 0.472143 NaN 0.610887
4 e 0.966606 0.865310 NaN 0.548699
5 f NaN 0.398824 0.668153 NaN
>>> B
key y z
0 a 0.867603 NaN
1 b NaN 0.191067
2 c 0.238616 0.803179
3 p 0.080446 NaN
4 q 0.932834 NaN
5 r 0.706561 0.814467
(FWIW,在本文结尾,我提供了生成这些数据帧的代码。)
我想在
key
列1上生成这些数据帧的外部联接,以使外部联接引起的新位置获得默认值0.0。 IOW,期望的结果看起来像这样 key u v w x y z
0 a 0.757954 0.258917 0.404934 0.303313 0.867603 NaN
1 b 0.583382 0.504687 NaN 0.618369 NaN 0.191067
2 c NaN 0.982785 0.902166 NaN 0.238616 0.803179
3 d 0.898838 0.472143 NaN 0.610887 0.000000 0.000000
4 e 0.966606 0.86531 NaN 0.548699 0.000000 0.000000
5 f NaN 0.398824 0.668153 NaN 0.000000 0.000000
6 p 0.000000 0.000000 0.000000 0.000000 0.080446 NaN
7 q 0.000000 0.000000 0.000000 0.000000 0.932834 NaN
8 r 0.000000 0.000000 0.000000 0.000000 0.706561 0.814467
(请注意,此所需的输出包含一些NaN,即
A
或B
中已经存在的NaN。)merge
方法可以使我顺利完成工作,但填写的默认值为NaN,而不是0.0:>>> C = pandas.DataFrame.merge(A, B, how='outer', on='key')
>>> C
key u v w x y z
0 a 0.757954 0.258917 0.404934 0.303313 0.867603 NaN
1 b 0.583382 0.504687 NaN 0.618369 NaN 0.191067
2 c NaN 0.982785 0.902166 NaN 0.238616 0.803179
3 d 0.898838 0.472143 NaN 0.610887 NaN NaN
4 e 0.966606 0.865310 NaN 0.548699 NaN NaN
5 f NaN 0.398824 0.668153 NaN NaN NaN
6 p NaN NaN NaN NaN 0.080446 NaN
7 q NaN NaN NaN NaN 0.932834 NaN
8 r NaN NaN NaN NaN 0.706561 0.814467
fillna
方法无法产生所需的输出,因为它修改了一些应保留不变的位置:>>> C.fillna(0.0)
key u v w x y z
0 a 0.757954 0.258917 0.404934 0.303313 0.867603 0.000000
1 b 0.583382 0.504687 0.000000 0.618369 0.000000 0.191067
2 c 0.000000 0.982785 0.902166 0.000000 0.238616 0.803179
3 d 0.898838 0.472143 0.000000 0.610887 0.000000 0.000000
4 e 0.966606 0.865310 0.000000 0.548699 0.000000 0.000000
5 f 0.000000 0.398824 0.668153 0.000000 0.000000 0.000000
6 p 0.000000 0.000000 0.000000 0.000000 0.080446 0.000000
7 q 0.000000 0.000000 0.000000 0.000000 0.932834 0.000000
8 r 0.000000 0.000000 0.000000 0.000000 0.706561 0.814467
如何有效地获得所需的输出? (这里的性能很重要,因为我打算在比此处显示的数据帧大得多的数据帧上执行此操作。)
FWIW,下面是生成示例数据帧
A
和B
的代码。from pandas import DataFrame
from collections import OrderedDict
from random import random, seed
def make_dataframe(rows, colnames):
return DataFrame(OrderedDict([(n, [row[i] for row in rows])
for i, n in enumerate(colnames)]))
maybe_nan = lambda: float('nan') if random() < 0.4 else random()
seed(0)
A = make_dataframe([['a', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['b', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['c', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['d', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['e', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['f', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()]],
('key', 'u', 'v', 'w', 'x'))
B = make_dataframe([['a', maybe_nan(), maybe_nan()],
['b', maybe_nan(), maybe_nan()],
['c', maybe_nan(), maybe_nan()],
['p', maybe_nan(), maybe_nan()],
['q', maybe_nan(), maybe_nan()],
['r', maybe_nan(), maybe_nan()]],
('key', 'y', 'z'))
1有关多键外部联接的情况,请参见here。
最佳答案
您可以在merge
之后填充零:
res = pd.merge(A, B, how="outer")
res.loc[~res.key.isin(A.key), A.columns] = 0
编辑
跳过
key
列:res.loc[~res.key.isin(A.key), A.columns.drop("key")] = 0