问题描述
尝试根据groupby
计算创建新列.在下面的代码中,我获得了每个日期的正确计算值(请参阅下面的组),但是当我尝试用它创建一个新列(df['Data4']
)时,我得到了NaN.因此,我试图在数据框中为所有日期创建一个总和为Data3
的新列,并将其应用于每个日期行.例如,2015-05-08位于2行中(总计为50 + 5 = 55),在此新列中,我希望两行中均具有55.
Trying to create a new column from the groupby
calculation. In the code below, I get the correct calculated values for each date (see group below) but when I try to create a new column (df['Data4']
) with it I get NaN. So I am trying to create a new column in the dataframe with the sum of Data3
for the all dates and apply that to each date row. For example, 2015-05-08 is in 2 rows (total is 50+5 = 55) and in this new column I would like to have 55 in both of the rows.
import pandas as pd
import numpy as np
from pandas import DataFrame
df = pd.DataFrame({
'Date' : ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'],
'Sym' : ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'],
'Data2': [11, 8, 10, 15, 110, 60, 100, 40],
'Data3': [5, 8, 6, 1, 50, 100, 60, 120]
})
group = df['Data3'].groupby(df['Date']).sum()
df['Data4'] = group
推荐答案
您要使用 transform
,这将返回一个序列,其索引与df对齐,因此您可以将其添加为新列:
You want to use transform
this will return a Series with the index aligned to the df so you can then add it as a new column:
In [74]:
df = pd.DataFrame({'Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120]})
df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
df
Out[74]:
Data2 Data3 Date Sym Data4
0 11 5 2015-05-08 aapl 55
1 8 8 2015-05-07 aapl 108
2 10 6 2015-05-06 aapl 66
3 15 1 2015-05-05 aapl 121
4 110 50 2015-05-08 aaww 55
5 60 100 2015-05-07 aaww 108
6 100 60 2015-05-06 aaww 66
7 40 120 2015-05-05 aaww 121
这篇关于如何从pandas groupby().sum()的输出中创建新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!