本文介绍了减去数据帧 pandas 时的NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个只有略有重叠的索引和列的数据框。

I have two dataframes with only somewhat overlapping indices and columns.

old = pd.DataFrame(index = ['A', 'B', 'C'],
                   columns = ['k', 'l', 'm'],
                   data = abs(np.floor(np.random.rand(3, 3)*10)))

new = pd.DataFrame(index = ['A', 'B', 'C', 'D'],
                   columns = ['k', 'l', 'm', 'n'],
                   data = abs(np.floor(np.random.rand(4, 4)*10)))

我想计算它们之间的差并尝试

I want to calculate the difference between them and tried

delta = new - old

这提供了许多NaN,其中索引和列可以不匹配。我想将索引和列的缺席视作零((旧[’n, D] = 0)。旧的将永远是新的子空间。

This gives lots of NaNs where indices and columns do not match. I would like to treat the abscence of the indices and columns as zeroes, (old['n', 'D'] = 0). old will always be a subspace of new.

有什么想法吗?

编辑:
我猜我没有足够彻底地解释它。我不想用零填充增量数据框。我想将丢失的索引和旧列视为零。然后,我将以delta的new ['n','D']而不是NaN的形式获取值。

I guess I didn't explain it thoroughly enough. I don't want to fill the delta dataframe with zeroes. I want to treat missing indices and columns in old as if they were zeroes. I would then get the value in new['n', 'D'] in delta instead of a NaN.

推荐答案

使用,其中 fill_value = 0

In [15]:
old = pd.DataFrame(index = ['A', 'B', 'C'],
                   columns = ['k', 'l', 'm'],
                   data = abs(np.floor(np.random.rand(3, 3)*10)))
​
new = pd.DataFrame(index = ['A', 'B', 'C', 'D'],
                   columns = ['k', 'l', 'm', 'n'],
                   data = abs(np.floor(np.random.rand(4, 4)*10)))
delta = new.sub(old, fill_value=0)
delta

Out[15]:
   k  l  m  n
A  0  3 -9  7
B  0 -2  1  8
C -4  1  1  7
D  8  6  0  6

这篇关于减去数据帧 pandas 时的NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 17:41