我必须在每个行中都有一个数据框df,我要减去一些列,columns_to_sub,一些称为“ absorb”的标签列,以及一些我不想更改的列。我想通过另一数据帧上的行减去columns_to_sub的值,并通过标签“ absorb”进行索引。这是我想要的非功能示例:

import pandas as pd
import numpy as np
data = np.hstack((np.random.randint(0,10,20).reshape(-1,1),np.random.rand(20,3)))
df = pd.DataFrame(data,columns=['absorb','a','b','c'])
columns_to_sub = ['a','b']

means = df.groupby('absorb')[columns_to_sub].mean()
#This result is not what I want, because the subtraction is strange
df[columns_to_sub] = df[columns_to_sub] - means.loc[df.absorb,columns_to_sub]


如何修复此代码?

最佳答案

你好亲近只需在values上使用means

df[columns_to_sub] = df[columns_to_sub] - means.loc[df.absorb,columns_to_sub].values
>>> df
    absorb         a         b         c
0        2 -0.060540 -0.270233  0.416213
1        9  0.597084  0.136158  0.415023
2        1 -0.131393 -0.535288  0.158465
3        3  0.282902 -0.008801  0.872598
4        9 -0.236306 -0.337588  0.297589
5        6  0.000000  0.000000  0.283559
6        3  0.022021 -0.110693  0.671295
7        7  0.042000 -0.327157  0.736395
8        1  0.097912  0.119899  0.409241
9        1 -0.460052  0.280302  0.341200
10       1  0.002855 -0.013902  0.648113
11       1  0.490679  0.148989  0.626300
12       8  0.000000  0.000000  0.986039
13       3 -0.304923  0.119494  0.553210
14       0  0.000000  0.000000  0.626576
15       5  0.000000  0.000000  0.105102
16       2 -0.166760 -0.122624  0.750912
17       2  0.227300  0.392857  0.498822
18       7 -0.042000  0.327157  0.323361
19       9 -0.360778  0.201430  0.521043

07-25 21:43
查看更多